Difference RL Agent training plot and result plot

Question

sungho park 2022-1-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1630320-difference-rl-agent-training-plot-and-result-plot

评论： Emmanouil Tzorakoleftherakis 2023-1-24

Hi, below the grahp shows the action during the training and second one shows different action after training. just constant..

can you please help me?

%% Create observation specification

obsInfo = rlNumericSpec([3 1]);

obsInfo.Name = 'observations';

numObs = obsInfo.Dimension(1);

%% Create action specification

actInfo = rlNumericSpec([1 1],'LowerLimit',-50,'UpperLimit',50);

%actInfo = rlNumericSpec([1 1]);

actInfo.Name = 'current';

numActions = actInfo.Dimension(1);

%% Create the environment

blk= [mdl '/RL Agent'];

env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);

env.ResetFcn= @(in)setVariable(in,'current',5,'Workspace',mdl);

env.UseFastRestart = 'off';

Ts= param.dt;

Tf= param.end_time;

rng(0)

%% Create DDPG Agent

statePath = [

featureInputLayer(numObs,'Normalization','none','Name','observations')

fullyConnectedLayer(200,'Name','CriticStateFC1')

reluLayer('Name', 'CriticRelu1')

fullyConnectedLayer(200,'Name','CriticStateFC2')];

actionPath = [

featureInputLayer(1,'Normalization','none','Name','action')

fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];

commonPath = [

additionLayer(2,'Name','add')

reluLayer('Name','CriticCommonRelu')

fullyConnectedLayer(1,'Name','CriticOutput')];

criticNetwork = layerGraph(statePath);

criticNetwork = addLayers(criticNetwork,actionPath);

criticNetwork = addLayers(criticNetwork,commonPath);

criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');

criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

figure

plot(criticNetwork)

criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);

%% Create the criticrepresentation using the specified deep neural

% network and options

critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation', ...

{'observations'},'Action',{'action'},criticOpts);

%% create the actor

actorNetwork = [

featureInputLayer(numObs,'Normalization','none','Name','observations')

fullyConnectedLayer(400,'Name','ctorFC1')

reluLayer('Name','ActorRelu1')

fullyConnectedLayer(300,'Name','ActorFC2')

reluLayer('Name','ActorRelu2')

fullyConnectedLayer(1,'Name','ActorFC3')

tanhLayer('Name','ActorTanh')

scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))

];

actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);

actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo, ...

'Observation',{'observations'},'Action',{'ActorScaling'},actorOpts);

%% Create the DDPG agent option

agentOpts = rlDDPGAgentOptions(...

'SampleTime',Ts,...

'TargetSmoothFactor',1e-3,...

'ExperienceBufferLength',1e6,...

'SaveExperienceBufferWithAgent',true,...

'DiscountFactor',0.99,...

"ResetExperienceBufferBeforeTraining",false,...

'MiniBatchSize',256);

agentOpts.NoiseOptions.Variance = 0.2;

agentOpts.NoiseOptions.VarianceDecayRate = 0;

agent = rlDDPGAgent(actor,critic,agentOpts);

%% Train Agent

maxepisodes = 10;

maxsteps = ceil(Tf/Ts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes',maxepisodes,...

'MaxStepsPerEpisode',maxsteps,...

'ScoreAveragingWindowLength',5,...

'Verbose',true,...

'Plots','training-progress',...

'StopTrainingCriteria','AverageReward',...

'StopTrainingValue',5000,...

'SaveAgentCriteria','EpisodeReward',...

'SaveAgentValue',5000);

doTraining = true;

%end

trainingStats = train(agent,env,trainOpts);

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Emmanouil Tzorakoleftherakis 2023-1-24

How long did you train for? If you only trained for 10 episodes, you should give it more time

请先登录，再进行评论。

请先登录，再回答此问题。

Difference RL Agent training plot and result plot

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Difference RL Agent training plot and result plot

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

WeChat

1 个评论
显示 -1更早的评论隐藏 -1更早的评论