2017-11-25 150 views
0

我正在學習Q-Learning,並試圖在OpenAI Gym的FrozenLake-v0問題上構建Q-learner。由於問題只有16個狀態和4個可能的操作,所以它應該相當容易,但看起來像我的算法沒有正確更新Q表。FrozenLake Q-Learning更新問題

以下是我的Q學習算法:

import gym 
import numpy as np 
from gym import wrappers 


def run(
    env, 
    Qtable, 
    N_STEPS=10000, 
    alpha=0.2, # 1-alpha the learning rate 
    rar=0.4, # random exploration rate 
    radr=0.97 # decay rate 
): 

    # Initialize pars:: 
    TOTAL_REWARD = 0 
    done = False 
    action = env.action_space.sample() 
    state = env.reset() 

    for _ in range(N_STEPS): 
     if done: 
      print('TW', TOTAL_REWARD) 
      break 

     s_prime, reward, done, info = env.step(action) 
     # Update Q Table: 
     Qtable[state, action] = (1 - alpha) * Qtable[state, action] + alpha * (reward + Qtable[s_prime,np.argmax(Qtable[s_prime,])]) 

     # Prepare for the next step: 
     # Next New Action: 
     if rand.uniform(0, 1) < rar: 
      action = env.action_space.sample() 
     else: 
      action = np.argmax(Qtable[s_prime, :]) 

     # Update new state: 
     state = s_prime 
     # Update Decay: 
     rar *= radr 
     # Update Stats 
     TOTAL_REWARD += reward 
     if reward > 0: 
      print(reward) 

    return Qtable, TOTAL_REWARD 

然後運行Q-學習1000次迭代:

if __name__=="__main__": 
    # Required Pars: 
    N_ITER = 1000 
    REWARDS = [] 
    # Setup the Maze: 
    env = gym.make('FrozenLake-v0') 

    # Initialize Qtable: 
    num_actions = env.unwrapped.nA 
    num_states = env.unwrapped.nS 
    # Qtable = np.random.uniform(0, 1, size=num_states * num_actions).reshape((num_states, num_actions)) 
    Qtable = np.zeros((env.observation_space.n, env.action_space.n)) 

    for _ in range(N_ITER): 
     res = run(env, Qtable) 
     Qtable = res[0] 
     REWARDS.append(res[1]) 
    print(np.mean(REWARDS)) 

任何意見,將不勝感激!

回答