DDPG agent low performance

1 次查看(过去 30 天)
Armin Norouzi
Armin Norouzi 2021-6-3
Hello everyone,
I am trying to train a DDPG agent for my system, and the goal is to generate actions (mf) to follow desired Torque. The attached figure shows episodic award vs. the number of episodes and plot of the system ( output, error, reward, and action). I set the output range from 5 to 30, but agent steel oscillating around these values. Although the training performance in episodic reward seems to converge, steel I am experiencing oscillatory response.
I would appreciate it if someone could help me with this matter.
here is my reward block in simulink:
It worth mentioning that I am using standar parameters for noise model:
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',0.99, ...
'MiniBatchSize',64, ...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.Variance = 0.05*(25/sqrt(Ts));
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
here is my actor and critic structure:
L = 500; % number of neurons
statePath = [
featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc2')
additionLayer(2,'Name','add')
reluLayer('Name','relu2')
fullyConnectedLayer(L, 'Name', 'fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(1, 'Name', 'fc4')];
actionPath = [
featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action')
fullyConnectedLayer(L, 'Name', 'fc5')];
actorNetwork = [
featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(L, 'Name', 'fc3')
reluLayer('Name', 'relu3')
fullyConnectedLayer(numActions, 'Name', 'fc4')
tanhLayer('Name','tanh1')
scalingLayer('Name','ActorScaling1','Scale',max(actInfo.UpperLimit))];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);

回答(0 个)

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by