DDPG with LSTM layer fails?

18 次查看(过去 30 天)
Hi, I am trying to train a DDPG model. My env is based in simulink and works fine when I have only feed forward layers in my network. But as soon as I add a LSTM layer, I get this error of not enough arguments, I am using matlab 2023a and asumed that this supports LSTM layers in a DDPG network.
Could someone tell me what is going on?
Thanks :)
Code:
%% H2DF DDPG Trainer
%
%
% clc
% clear all
% close all
ObsInfo.Name = "Engine Outputs";
ObsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
%% Creating envirement
obsInfo = rlNumericSpec([8 1],...
'LowerLimit',[-inf -inf -inf -inf -inf -inf -inf -inf ]',...
'UpperLimit',[inf inf inf inf inf inf inf inf]');
rlNumericSpec requires Reinforcement Learning Toolbox.
obsInfo.Name = "Engine Outputs";
obsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
numObservations = obsInfo.Dimension(1);
version = '1123_002_GRU';
Data = struct2table(load(['VSR', version, '_post.mat']));
Data.label = string(Data.label);
ind = boolean(sum(Data.label == C2C_NMPC.Labels.outputs.', 2));
outputs_mean = [Data.mean{boolean(ind)}].';
outputs_std = [Data.std{ind}].';
ind = boolean(sum(Data.label == C2C_NMPC.Labels.controls.', 2));
controls_mean = [Data.mean{ind}].';
controls_std = [Data.std{ind}].';
lower_limit_controls = [0.17e-3;350;-2;1e-3];
upper_limit_controls = [0.5e-3;900;3;5.5e-3];
lower_limit_controls_norm = (lower_limit_controls - controls_mean)./controls_std;
upper_limit_controls_norm = (upper_limit_controls - controls_mean)./controls_std;
Ts = 0.01;
Tf = 10;
variance_normalised = ([1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))] - controls_mean)./controls_std;
actInfo = rlNumericSpec([4 1],'LowerLimit',lower_limit_controls_norm,'UpperLimit',upper_limit_controls_norm);
actInfo.Name = "Engine Inputs";
actInfo.Description = 'DOI, P2M, SOI, DOI_H2';
numActions = actInfo.Dimension(1);
env.ResetFcn = @(in)localResetFcn(in);
env = rlSimulinkEnv('MPC_RL_H2DF','MPC_RL_H2DF/RL Agent',...
obsInfo,actInfo);
% 375 engine cycle results
rng(0)
% 1200 - 0.1| 1900: 0.06
%% Createing Agent
L = 60; % number of neurons
statePath = [
featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc11')
reluLayer('Name', 'relu11')
% fullyConnectedLayer(L, 'Name', 'fc12')
% reluLayer('Name', 'relu12')
lstmLayer(2,"OutputMode","sequence")
fullyConnectedLayer(L, 'Name', 'fc15')
reluLayer('Name', 'relu15')
fullyConnectedLayer(L, 'Name', 'fc2')
additionLayer(2,'Name','add')
reluLayer('Name','relu2')
% fullyConnectedLayer(L, 'Name', 'fc3')
% reluLayer('Name','relu3')
% fullyConnectedLayer(L, 'Name', 'fc7')
% reluLayer('Name','relu7')
fullyConnectedLayer(1, 'Name', 'fc4','BiasInitializer','ones','WeightsInitializer','he')];
actionPath = [
featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action')
fullyConnectedLayer(L, 'Name', 'fc6')
reluLayer('Name','relu6')
fullyConnectedLayer(L, 'Name', 'fc13')
reluLayer('Name','relu13')
fullyConnectedLayer(L, 'Name', 'fc14')
reluLayer('Name','relu14')
fullyConnectedLayer(L, 'Name', 'fc5','BiasInitializer','ones','WeightsInitializer','he')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
%%
actorNetwork = [
featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
fullyConnectedLayer(L, 'Name', 'fc1')
reluLayer('Name', 'relu1')
fullyConnectedLayer(L, 'Name', 'fc2')
reluLayer('Name', 'relu2')
fullyConnectedLayer(L, 'Name', 'fc3')
reluLayer('Name', 'relu3')
fullyConnectedLayer(L, 'Name', 'fc8')
reluLayer('Name', 'relu8')
fullyConnectedLayer(L, 'Name', 'fc9')
reluLayer('Name', 'relu9')
fullyConnectedLayer(L, 'Name', 'fc10')
reluLayer('Name', 'relu10')
fullyConnectedLayer(numActions, 'Name', 'fc4')
tanhLayer('Name','tanh1')
scalingLayer('Name','ActorScaling1','Scale',-(actInfo.UpperLimit-actInfo.LowerLimit)/2,'Bias',(actInfo.UpperLimit+actInfo.LowerLimit)/2)];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
%% Deep Deterministic Policy Gradient (DDPG) agent
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',0.99, ...
'MiniBatchSize',1024, ...
'ExperienceBufferLength',1e7);
% agentOpts.NoiseOptions.Variance =
% [0.005*(70/sqrt(Ts));0.005*(12/sqrt(Ts));0.005*(0.4/sqrt(Ts))] v01
agentOpts.NoiseOptions.Variance = [1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))];
agentOpts.NoiseOptions.Variance = variance_normalised;
agentOpts.NoiseOptions.VarianceDecayRate = [1e-6;1e-6;1e-6;1e-6];
% agent = rlDDPGAgent(actor,critic,agentOpts);
% variance*ts^2 = (0.01 - 0.1)*(action range)
% At each sample time step, the noise model is updated using the following formula, where Ts is the agent sample time.
%
% x(k) = x(k-1) + MeanAttractionConstant.*(Mean - x(k-1)).*Ts
% + Variance.*randn(size(Mean)).*sqrt(Ts)
% At each sample time step, the variance decays as shown in the following code.
%
% decayedVariance = Variance.*(1 - VarianceDecayRate);
% Variance = max(decayedVariance,VarianceMin);
% For continuous action signals, it is important to set the noise variance appropriately to encourage exploration. It is common to have Variance*sqrt(Ts) be between 1% and 10% of your action range.
%
% If your agent converges on local optima too quickly, promote agent exploration by increasing the amount of noise; that is, by increasing the variance. Also, to increase exploration, you can reduce the VarianceDecayRate.
%% Training agent
maxepisodes = 500;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',100, ...
'Verbose',false, ...
'UseParallel',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',0,...
'SaveAgentCriteria','EpisodeReward','SaveAgentValue',-50');
%%
% % Set to true, to resume training from a saved agent
resumeTraining = false;
% % Set ResetExperienceBufferBeforeTraining to false to keep experience from the previous session
agentOpts.ResetExperienceBufferBeforeTraining = ~(resumeTraining);
if resumeTraining
% Load the agent from the previous session
sprintf('- Resume training of: %s', 'agentV04.mat');
trainedfile = load('D:\Masters\HiWi\h2dfannbasedmpc\acados_implementation\rl\savedAgents\Agent1620.mat','saved_agent');
agent =trainedfile.saved_agent;
else
% Create a fresh new agent
agent = rlDDPGAgent(actor, critic, agentOpts);
end
% agent = rlDDPGAgent(actor, critic, agentOpts);
% agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train the agent
trainingStats = train(agent, env, trainOpts);
trainingStats = train(agent,env,trainOpts);
% get the agent's actor, which predicts next action given the current observation
actor = getActor(agent);
% get the actor's parameters (neural network weights)
%actorParams = getLearnableParameterValues(actor);
And the error:
Error using rl.train.SeriesTrainer/run
There was an error executing the ProcessExperienceFcn for block "MPC_RL_H2DF/RL Agent".
Caused by:
Error using rl.function.AbstractFunction/evaluate
Unable to evaluate function model.
Error in rl.function.rlQValueFunction/getValue (line 74)
[qValue, state, batchSize, sequenceLength] = evaluate(this, [observation; action]);
Error in rl.agent.rlDDPGAgent/criticLearn_ (line 359)
targetQ = getValue(this.TargetCritic_,miniBatch.NextObservation,nextActions);
Error in rl.agent.rlDDPGAgent/learnFromBatchData_ (line 325)
[criticGradient, criticLoss] = criticLearn_(this, minibatch, maskIdx,sampleIdx,weights);
Error in rl.agent.AbstractOffPolicyAgent/learnFromBatchData (line 76)
[this, learnData] = learnFromBatchData_(this,batchData,maskIdx, Idx, Weights);
Error in rl.agent.rlDDPGAgent/learnFromExperiencesInMemory_ (line 307)
[~, learnData] = learnFromBatchData(this,minibatch,maskIdx, sampleIdx, weights);
Error in rl.agent.mixin.InternalMemoryTrainable/learnFromExperiencesInMemory (line 32)
learnFromExperiencesInMemory_(this);
Error in rl.agent.AbstractOffPolicyAgent/learn_ (line 104)
learnFromExperiencesInMemory(this);
Error in rl.agent.AbstractAgent/learn (line 29)
this = learn_(this,experience);
Error in rl.util.agentProcessStepExperience (line 6)
learn(Agent,Exp);
Error in rl.env.internal.FunctionHandlePolicyExperienceProcessor/processExperience_ (line 31)
[this.Policy_,this.Data_] = feval(this.Fcn_,...
Error in rl.env.internal.ExperienceProcessorInterface/processExperienceInternal_ (line 139)
processExperience_(this,experience,infoData);
Error in rl.env.internal.ExperienceProcessorInterface/processExperience (line 78)
stopsim = processExperienceInternal_(this,experience,simTime);
Error in rl.simulink.blocks.PolicyProcessExperience/stepImpl (line 45)
stopsim = processExperience(this.ExperienceProcessor_,experience,simTime);
Error in Simulink.Simulation.internal.DesktopSimHelper
Error in Simulink.Simulation.internal.DesktopSimHelper.sim
Error in Simulink.SimulationInput/sim
Error in rl.env.internal.SimulinkSimulator>localSim (line 259)
simout = sim(in);
Error in rl.env.internal.SimulinkSimulator>@(in)localSim(in,simPkg) (line 171)
simfcn = @(in) localSim(in,simPkg);
Error in MultiSim.internal.runSingleSim
Error in MultiSim.internal.SimulationRunnerSerial/executeImplSingle
Error in MultiSim.internal.SimulationRunnerSerial/executeImpl
Error in Simulink.SimulationManager/executeSims
Error in Simulink.SimulationManagerEngine/executeSims
Error in rl.env.internal.SimulinkSimulator/simInternal_ (line 172)
simInfo = executeSims(engine,simfcn,getSimulationInput(this));
Error in rl.env.internal.SimulinkSimulator/sim_ (line 78)
out = simInternal_(this,simPkg);
Error in rl.env.internal.AbstractSimulator/sim (line 30)
out = sim_(this,simData,policy,processExpFcn,processExpData);
Error in rl.env.AbstractEnv/runEpisode (line 144)
out = sim(simulator,simData,policy,processExpFcn,processExpData);
Error in rl.train.SeriesTrainer/run (line 59)
out = runEpisode(...
Error in rl.train.TrainingManager/train (line 479)
run(trainer);
Error in rl.train.TrainingManager/run (line 233)
train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 195)
trainingStats = train(agent, env, trainOpts);
Caused by:
Not enough input arguments.
Error in rl.train.TrainingManager/train (line 479)
run(trainer);
Error in rl.train.TrainingManager/run (line 233)
train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 195)
trainingStats = train(agent, env, trainOpts);

采纳的回答

Emmanouil Tzorakoleftherakis
Hello,
I see a couple of things wrong with the current architecture (could be more):
1) When you use the lstm layer, the input layer should be a sequence layer, not a feature input layer
2) The lstm layer should be used both for the actor and for the critic.
I think the easiest way for you to figure out a correct architecture is to use the default agent feature initially. You can then take the generated architecture and fine-tune it for your specific applications. See for example here. Make sure to specify that you want an rnn network in the agent initialization options.
Hope that helps
  5 个评论
Emmanouil Tzorakoleftherakis
Can you please post this as a separate question? Nested questions are not easy to discover.
Thanks

请先登录,再进行评论。

更多回答(1 个)

Gagan Agarwal
Gagan Agarwal 2023-12-21
Hi Vasu,
I understand that you are encountering the error while implementing the provided code in MATLAB. LSTM layers are supported in DDPG network of MATLAB.
To address the error, consider the following suggestions:
  1. Confirm if the Reinforcement Learning toolbox is installed in your MATLAB environment.
  2. Review the training options and agent options to ensure they are configured correctly.
  3. Verify that the observation and action specifications align with the input requirements of your network architectures.
I hope it helps!

类别

Help CenterFile Exchange 中查找有关 Training and Simulation 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by