Q-table issues in the example "Q-learning in the basic grid world"

1 次查看（过去 30 天）

Fangyuan Chang 2020-11-9

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/641220-q-table-issues-in-the-example-q-learning-in-the-basic-grid-world

评论： Adi Firdaus 2021-12-11

I trained a Q-learning agent in the matlab predefined environment "BasicGridWorld". I have an issue about the updates of the Q-table. When I set the number of episode to be 1, and set the episode step to be 1, I expect that the new updated Q-value equals to (alpha * R) according to the Bellman equation, where alpha is the learning rate and R is the instant reward. However, the code generates a Q-value different from my expectation. Can anyone help? The code is attached as follows:

rng(0)
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate = 0.1;
critic.Options.L2RegularizationFactor = 0;
critic.Options.Optimizer = "sgdm";
critic.Options.OptimizerParameters.Momentum = 0;
opts = rlQAgentOptions;
opts.EpsilonGreedyExploration.Epsilon = 0.8;
opts.EpsilonGreedyExploration.EpsilonMin = 0.01;
opts.EpsilonGreedyExploration.EpsilonDecay = 0.01;
opts.DiscountFactor = 0.5; 
agent = rlQAgent(critic,opts);
trainOpts = rlTrainingOptions(...
    'MaxEpisodes',1,...
    'MaxStepsPerEpisode',1,...
    'StopTrainingCriteria',"AverageReward",...
    'StopTrainingValue',30,...
    'Verbose',true,...
    'Plots','none');
trainOpts.ScoreAveragingWindowLength = 50;
trainingStats = train(agent,env,trainOpts);
trained_critic=getCritic(agent);
trained_table = getLearnableParameters(trained_critic);
trained_qtable=trained_table{1};
% check the updated Q-value
[r,c]=find(trained_table{1,1}~=0);
Q_value = trained_table{1,1}(r,c)