Saved agent always gives constant output no matter how or how much I train it

Question

Abdul Basith Ashraf 2021-4-5

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/792657-saved-agent-always-gives-constant-output-no-matter-how-or-how-much-i-train-it

编辑： Abdul Basith Ashraf 2021-4-8

I trained a DDPG RL Agent in Simulink environment. The training looked fine to me and I saved agents in the process.

I trained the RL agent using different networks and the saved agents always gives a const output (namely, the LowerLimit of action)

Please help me. I have been looking for help from the past week.

INPUTMAX = 1E-4;
actionInfo = rlNumericSpec([2 1],'LowerLimit',-INPUTMAX,'UpperLimit', INPUTMAX);
actionInfo.Name = 'Inlet flow rate change';
observationInfo = rlNumericSpec([5 1],'LowerLimit',[300;300;1.64e5;0;0],'UpperLimit',[393;373;6e5;0.01;0.01]);
observationInfo.Name = 'Temperatures, Pressure and flow rates';
env = rlSimulinkEnv(mdl,[mdl '/RL Agent'],observationInfo,actionInfo);
L = 25; % number of neurons
%% CRITIC NETWORK
statePath = [
    featureInputLayer(5,'Normalization','none','Name','observation')
    fullyConnectedLayer(L,'Name','fc1')
    reluLayer('Name','relu1')
    concatenationLayer(1,2,"Name",'concat')
    fullyConnectedLayer(29,'Name', 'fc2')
    reluLayer("Name",'relu3')
    fullyConnectedLayer(29,'Name', 'fc3')
    reluLayer('Name','relu2')
    fullyConnectedLayer(1,'Name','fc4')
    ];
actionPath = [
    featureInputLayer(2,'Normalization','none','Name','action')
    fullyConnectedLayer(4,'Name','fcaction')
    reluLayer("Name",'actionrelu')
    ];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
    
criticNetwork = connectLayers(criticNetwork,'actionrelu','concat/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1,'L2RegularizationFactor',1e-4,"UseDevice","gpu");
critic = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);
%  plot(criticNetwork)
%% ACTOR NETWORK
actorNetwork = [
    featureInputLayer(5,'Normalization','none','Name','observation')
    fullyConnectedLayer(L,'Name','fc1')
    sigmoidLayer('Name','sig1')
    fullyConnectedLayer(L,'Name','fc4')
    reluLayer('Name','relu4')
    fullyConnectedLayer(2,'Name','fc5')
    tanhLayer('Name','tanh1')
    scalingLayer("Name","scale","Scale",INPUTMAX*ones(2,1))
    ];
actorNetwork = layerGraph(actorNetwork);
% plot(actorNetwork)
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-5,"UseDevice","gpu");
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'scale'},actorOptions);
agentOptions = rlDDPGAgentOptions(...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e4,...
    'SampleTime',1,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',64,...
    "NumStepsToLookAhead",1,...
    "SaveExperienceBufferWithAgent",true, ...
    "ResetExperienceBufferBeforeTraining",false);
agentOptions.NoiseOptions.Variance = 0.4;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOptions);
maxepisodes = 1000;
maxsteps = 500;
trainingOpts = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'Verbose',false,...
    'Plots','training-progress',...
    "ScoreAveragingWindowLength",50,...
    "StopTrainingCriteria","AverageSteps",...
    'StopTrainingValue',501,...
    'SaveAgentCriteria',"EpisodeReward", ...
    "SaveAgentValue",0);
trainingOpts.UseParallel = true;
trainingOpts.ParallelizationOptions.Mode = 'async';
trainingStats = train(agent,env,trainingOpts);

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2021-4-5

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/792657-saved-agent-always-gives-constant-output-no-matter-how-or-how-much-i-train-it#answer_667382

The problem formulation is not correct. I suspect that even during training, you are seeing a lot of bang bang actions. The biggest issue is that the noise variance is pretty big compared to your action range. This needs to be fixed. Take a look at this note, "It is common to set StandardDeviation*sqrt(Ts) to a value between 1% and 10% of your action range"

4 个评论
显示 2更早的评论隐藏 2更早的评论

Emmanouil Tzorakoleftherakis 2021-4-8

It decays over global episode steps - so it carries over from episode to episode. Reducing the decay rate would make the agent explore more over time, that may be something to try

Abdul Basith Ashraf 2021-4-8

编辑：Abdul Basith Ashraf 2021-4-8

Also, what is the effect of parallel workers in async mode?

请先登录，再进行评论。

Saved agent always gives constant output no matter how or how much I train it

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论
显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Saved agent always gives constant output no matter how or how much I train it

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论 显示 2更早的评论隐藏 2更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论