Reinforcement Leaning DQN Training Convergence Problem

Question

Gülin Sayal 2021-6-6

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/849045-reinforcement-leaning-dqn-training-convergence-problem

training.PNG

Hi everyone,

I am designing an energy management system for a vehicle, and using DQN for optimizing fuel consumption. Here are some related lines from my code.

env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
nI = obsInfo.Dimension(1);  
nL = 24;   
nO = numel(actInfo.Elements);
dnn = [
    featureInputLayer(nI,'Name','state','Normalization','none')
    fullyConnectedLayer(nL,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(nL,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(nO,'Name','output')];
criticOpts = rlRepresentationOptions('LearnRate',0.00025,'GradientThreshold',1);
critic = rlQValueRepresentation(dnn,obsInfo,actInfo,'Observation',{'state'},criticOpts);
agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false, ...    
    'TargetUpdateMethod',"periodic", ...
    'TargetUpdateFrequency',4, ...   
    'ExperienceBufferLength',1000, ...
    'DiscountFactor',0.99, ...
    'MiniBatchSize',32);
agentOptions.EpsilonGreedyExploration.Epsilon=1;
agentOptions.EpsilonGreedyExploration.EpsilonMin=0.2;
agentOptions.EpsilonGreedyExploration.EpsilonDecay=0.0050;
agentObj = rlDQNAgent(critic,agentOpts)
maxepisodes = 10000;
maxsteps = ceil(T/Ts);
trainingOpts = rlTrainingOptions('MaxEpisodes',10000,...
    'MaxStepsPerEpisode',maxsteps,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','EpisodeReward',...
    'StopTrainingValue', 0);
 trainingStats = train(agentObj,env,trainingOpts)

The problem is that after training, rewards do not converge. Moreover, long-term estimated cumulative reward Q0 diverges. I already read some posts regarding the topic here, then I normalized my action and observation space which did not help. In addition to that, I also tried adding scaling layer right before the last fullyConnectedLayer which also did not help. You can find my training progress curves in attachment.

So, what can I try further so that Q0 does not diverge and episode rewards converge.

Also, I would really like to know how the Q0 is calculated. It is not possible for my model to have such big long-term estimated rewards.

Best Regards,

Gülin

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Reinforcement Leaning DQN Training Convergence Problem

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Reinforcement Leaning DQN Training Convergence Problem

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论