Difference RL Agent training plot and result plot

6 次查看(过去 30 天)
Hi, below the grahp shows the action during the training and second one shows different action after training. just constant..
can you please help me?
%% Create observation specification
obsInfo = rlNumericSpec([3 1]);
obsInfo.Name = 'observations';
numObs = obsInfo.Dimension(1);
%% Create action specification
actInfo = rlNumericSpec([1 1],'LowerLimit',-50,'UpperLimit',50);
%actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'current';
numActions = actInfo.Dimension(1);
%% Create the environment
blk= [mdl '/RL Agent'];
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);
env.ResetFcn= @(in)setVariable(in,'current',5,'Workspace',mdl);
env.UseFastRestart = 'off';
Ts= param.dt;
Tf= param.end_time;
rng(0)
%% Create DDPG Agent
statePath = [
featureInputLayer(numObs,'Normalization','none','Name','observations')
fullyConnectedLayer(200,'Name','CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(200,'Name','CriticStateFC2')];
actionPath = [
featureInputLayer(1,'Normalization','none','Name','action')
fullyConnectedLayer(200,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
figure
plot(criticNetwork)
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
%% Create the criticrepresentation using the specified deep neural
% network and options
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation', ...
{'observations'},'Action',{'action'},criticOpts);
%% create the actor
actorNetwork = [
featureInputLayer(numObs,'Normalization','none','Name','observations')
fullyConnectedLayer(400,'Name','ctorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(300,'Name','ActorFC2')
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(1,'Name','ActorFC3')
tanhLayer('Name','ActorTanh')
scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))
];
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo, ...
'Observation',{'observations'},'Action',{'ActorScaling'},actorOpts);
%% Create the DDPG agent option
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'ExperienceBufferLength',1e6,...
'SaveExperienceBufferWithAgent',true,...
'DiscountFactor',0.99,...
"ResetExperienceBufferBeforeTraining",false,...
'MiniBatchSize',256);
agentOpts.NoiseOptions.Variance = 0.2;
agentOpts.NoiseOptions.VarianceDecayRate = 0;
agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train Agent
maxepisodes = 10;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',5,...
'Verbose',true,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',5000,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',5000);
doTraining = true;
%end
trainingStats = train(agent,env,trainOpts);

回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by