DDPG with LSTM layer fails?

Question

Vasu Sharma 2023-12-7

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2057379-ddpg-with-lstm-layer-fails

评论： Vasu Sharma 2024-1-31

Hi, I am trying to train a DDPG model. My env is based in simulink and works fine when I have only feed forward layers in my network. But as soon as I add a LSTM layer, I get this error of not enough arguments, I am using matlab 2023a and asumed that this supports LSTM layers in a DDPG network.

Could someone tell me what is going on?

Thanks :)

Code:

%% H2DF DDPG Trainer
%
% 
% clc
% clear all
% close all
ObsInfo.Name = "Engine Outputs";
ObsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
%% Creating envirement
obsInfo = rlNumericSpec([8 1],...
 'LowerLimit',[-inf -inf -inf -inf -inf -inf -inf -inf  ]',...
 'UpperLimit',[inf inf inf inf inf inf inf inf]');
rlNumericSpec requires Reinforcement Learning Toolbox.
obsInfo.Name = "Engine Outputs";
obsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
numObservations = obsInfo.Dimension(1);
version = '1123_002_GRU';
Data = struct2table(load(['VSR', version, '_post.mat']));
Data.label = string(Data.label);
ind = boolean(sum(Data.label == C2C_NMPC.Labels.outputs.', 2));
outputs_mean = [Data.mean{boolean(ind)}].';
outputs_std = [Data.std{ind}].';
ind = boolean(sum(Data.label == C2C_NMPC.Labels.controls.', 2));
controls_mean = [Data.mean{ind}].';
controls_std = [Data.std{ind}].';
lower_limit_controls = [0.17e-3;350;-2;1e-3];
upper_limit_controls = [0.5e-3;900;3;5.5e-3];
lower_limit_controls_norm = (lower_limit_controls - controls_mean)./controls_std;
upper_limit_controls_norm = (upper_limit_controls - controls_mean)./controls_std;
Ts = 0.01;
Tf = 10;
variance_normalised = ([1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))] - controls_mean)./controls_std;
actInfo = rlNumericSpec([4 1],'LowerLimit',lower_limit_controls_norm,'UpperLimit',upper_limit_controls_norm);
actInfo.Name = "Engine Inputs";
actInfo.Description = 'DOI, P2M, SOI, DOI_H2';
numActions = actInfo.Dimension(1);
env.ResetFcn = @(in)localResetFcn(in);
env = rlSimulinkEnv('MPC_RL_H2DF','MPC_RL_H2DF/RL Agent',...
 obsInfo,actInfo);
% 375 engine cycle results
rng(0)
% 1200 - 0.1| 1900: 0.06
%% Createing Agent
L = 60; % number of neurons
statePath = [
    featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
    fullyConnectedLayer(L, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(L, 'Name', 'fc11')
    reluLayer('Name', 'relu11')
    % fullyConnectedLayer(L, 'Name', 'fc12')
    % reluLayer('Name', 'relu12')
    lstmLayer(2,"OutputMode","sequence")
    fullyConnectedLayer(L, 'Name', 'fc15')
    reluLayer('Name', 'relu15')
    fullyConnectedLayer(L, 'Name', 'fc2')
    additionLayer(2,'Name','add')
    reluLayer('Name','relu2')
    % fullyConnectedLayer(L, 'Name', 'fc3')
    % reluLayer('Name','relu3')
    % fullyConnectedLayer(L, 'Name', 'fc7')
    % reluLayer('Name','relu7')
    fullyConnectedLayer(1, 'Name', 'fc4','BiasInitializer','ones','WeightsInitializer','he')];
actionPath = [
    featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action')
    fullyConnectedLayer(L, 'Name', 'fc6')
    reluLayer('Name','relu6')
    fullyConnectedLayer(L, 'Name', 'fc13')
    reluLayer('Name','relu13')
    fullyConnectedLayer(L, 'Name', 'fc14')
    reluLayer('Name','relu14')
    fullyConnectedLayer(L, 'Name', 'fc5','BiasInitializer','ones','WeightsInitializer','he')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
    
criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
 'Observation',{'observation'},'Action',{'action'},criticOptions);
%%
actorNetwork = [
    featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
    fullyConnectedLayer(L, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(L, 'Name', 'fc2')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(L, 'Name', 'fc3')
    reluLayer('Name', 'relu3')
    fullyConnectedLayer(L, 'Name', 'fc8')
    reluLayer('Name', 'relu8')
    fullyConnectedLayer(L, 'Name', 'fc9')
    reluLayer('Name', 'relu9')
    fullyConnectedLayer(L, 'Name', 'fc10')
    reluLayer('Name', 'relu10')
    fullyConnectedLayer(numActions, 'Name', 'fc4')
    tanhLayer('Name','tanh1')
    scalingLayer('Name','ActorScaling1','Scale',-(actInfo.UpperLimit-actInfo.LowerLimit)/2,'Bias',(actInfo.UpperLimit+actInfo.LowerLimit)/2)];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
 'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
%% Deep Deterministic Policy Gradient (DDPG) agent
agentOpts = rlDDPGAgentOptions(...
 'SampleTime',Ts,...
 'TargetSmoothFactor',1e-3,...
 'DiscountFactor',0.99, ...
 'MiniBatchSize',1024, ...
 'ExperienceBufferLength',1e7);
% agentOpts.NoiseOptions.Variance =
% [0.005*(70/sqrt(Ts));0.005*(12/sqrt(Ts));0.005*(0.4/sqrt(Ts))] v01
agentOpts.NoiseOptions.Variance = [1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))];
agentOpts.NoiseOptions.Variance = variance_normalised;
agentOpts.NoiseOptions.VarianceDecayRate = [1e-6;1e-6;1e-6;1e-6];
% agent = rlDDPGAgent(actor,critic,agentOpts);
% variance*ts^2 = (0.01 - 0.1)*(action range)
% At each sample time step, the noise model is updated using the following formula, where Ts is the agent sample time.
% 
% x(k) = x(k-1) + MeanAttractionConstant.*(Mean - x(k-1)).*Ts
%        + Variance.*randn(size(Mean)).*sqrt(Ts)
% At each sample time step, the variance decays as shown in the following code.
% 
% decayedVariance = Variance.*(1 - VarianceDecayRate);
% Variance = max(decayedVariance,VarianceMin);
% For continuous action signals, it is important to set the noise variance appropriately to encourage exploration. It is common to have Variance*sqrt(Ts) be between 1% and 10% of your action range.
% 
% If your agent converges on local optima too quickly, promote agent exploration by increasing the amount of noise; that is, by increasing the variance. Also, to increase exploration, you can reduce the VarianceDecayRate.
%% Training agent
maxepisodes = 500;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
 'MaxEpisodes',maxepisodes, ...
 'MaxStepsPerEpisode',maxsteps, ...
 'ScoreAveragingWindowLength',100, ...
 'Verbose',false, ...
 'UseParallel',false,...
 'Plots','training-progress',...
 'StopTrainingCriteria','AverageReward',...
 'StopTrainingValue',0,...
 'SaveAgentCriteria','EpisodeReward','SaveAgentValue',-50');
%%
% % Set to true, to resume training from a saved agent
 resumeTraining = false;
% % Set ResetExperienceBufferBeforeTraining to false to keep experience from the previous session
 agentOpts.ResetExperienceBufferBeforeTraining = ~(resumeTraining);
if resumeTraining
    % Load the agent from the previous session
    sprintf('- Resume training of: %s', 'agentV04.mat');   
    trainedfile = load('D:\Masters\HiWi\h2dfannbasedmpc\acados_implementation\rl\savedAgents\Agent1620.mat','saved_agent');
    agent =trainedfile.saved_agent;
else
    % Create a fresh new agent
    agent = rlDDPGAgent(actor, critic, agentOpts);
end
% agent = rlDDPGAgent(actor, critic, agentOpts);
% agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train the agent
trainingStats = train(agent, env, trainOpts);
trainingStats = train(agent,env,trainOpts);
% get the agent's actor, which predicts next action given the current observation
actor       = getActor(agent);
% get the actor's parameters (neural network weights)
%actorParams = getLearnableParameterValues(actor);

And the error:

Error using rl.train.SeriesTrainer/run
There was an error executing the ProcessExperienceFcn for block "MPC_RL_H2DF/RL Agent".
Caused by:
	Error using rl.function.AbstractFunction/evaluate
	Unable to evaluate function model.
	Error in rl.function.rlQValueFunction/getValue (line 74)
	            [qValue, state, batchSize, sequenceLength] = evaluate(this, [observation; action]);
	Error in rl.agent.rlDDPGAgent/criticLearn_ (line 359)
	            targetQ = getValue(this.TargetCritic_,miniBatch.NextObservation,nextActions);
	Error in rl.agent.rlDDPGAgent/learnFromBatchData_ (line 325)
	                [criticGradient, criticLoss]  = criticLearn_(this, minibatch, maskIdx,sampleIdx,weights);
	Error in rl.agent.AbstractOffPolicyAgent/learnFromBatchData (line 76)
	            [this, learnData] = learnFromBatchData_(this,batchData,maskIdx, Idx, Weights);
	Error in rl.agent.rlDDPGAgent/learnFromExperiencesInMemory_ (line 307)
	            [~, learnData] = learnFromBatchData(this,minibatch,maskIdx, sampleIdx, weights);
	Error in rl.agent.mixin.InternalMemoryTrainable/learnFromExperiencesInMemory (line 32)
	            learnFromExperiencesInMemory_(this);
	Error in rl.agent.AbstractOffPolicyAgent/learn_ (line 104)
	            learnFromExperiencesInMemory(this);
	Error in rl.agent.AbstractAgent/learn (line 29)
	            this = learn_(this,experience);
	Error in rl.util.agentProcessStepExperience (line 6)
	learn(Agent,Exp);
	Error in rl.env.internal.FunctionHandlePolicyExperienceProcessor/processExperience_ (line 31)
	                [this.Policy_,this.Data_] = feval(this.Fcn_,...
	Error in rl.env.internal.ExperienceProcessorInterface/processExperienceInternal_ (line 139)
	                processExperience_(this,experience,infoData);
	Error in rl.env.internal.ExperienceProcessorInterface/processExperience (line 78)
	                stopsim = processExperienceInternal_(this,experience,simTime);
	Error in rl.simulink.blocks.PolicyProcessExperience/stepImpl (line 45)
	            stopsim = processExperience(this.ExperienceProcessor_,experience,simTime);
	Error in Simulink.Simulation.internal.DesktopSimHelper
	Error in Simulink.Simulation.internal.DesktopSimHelper.sim
	Error in Simulink.SimulationInput/sim
	Error in rl.env.internal.SimulinkSimulator>localSim (line 259)
	    simout = sim(in);
	Error in rl.env.internal.SimulinkSimulator>@(in)localSim(in,simPkg) (line 171)
	            simfcn = @(in) localSim(in,simPkg);
	Error in MultiSim.internal.runSingleSim
	Error in MultiSim.internal.SimulationRunnerSerial/executeImplSingle
	Error in MultiSim.internal.SimulationRunnerSerial/executeImpl
	Error in Simulink.SimulationManager/executeSims
	Error in Simulink.SimulationManagerEngine/executeSims
	Error in rl.env.internal.SimulinkSimulator/simInternal_ (line 172)
	            simInfo = executeSims(engine,simfcn,getSimulationInput(this));
	Error in rl.env.internal.SimulinkSimulator/sim_ (line 78)
	                out = simInternal_(this,simPkg);
	Error in rl.env.internal.AbstractSimulator/sim (line 30)
	            out = sim_(this,simData,policy,processExpFcn,processExpData);
	Error in rl.env.AbstractEnv/runEpisode (line 144)
	    out = sim(simulator,simData,policy,processExpFcn,processExpData);
	Error in rl.train.SeriesTrainer/run (line 59)
	                out = runEpisode(...
	Error in rl.train.TrainingManager/train (line 479)
	            run(trainer);
	Error in rl.train.TrainingManager/run (line 233)
	            train(this);
	Error in rl.agent.AbstractAgent/train (line 136)
	trainingResult = run(trainMgr,checkpoint);
	Error in MPC_RL_lstm_run_pureR_H2DF (line 195)
	trainingStats = train(agent, env, trainOpts);
	Caused by:
	    Not enough input arguments.
Error in rl.train.TrainingManager/train (line 479)
            run(trainer);
Error in rl.train.TrainingManager/run (line 233)
            train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 195)
trainingStats = train(agent, env, trainOpts);

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Emmanouil Tzorakoleftherakis 2023-12-21

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2057379-ddpg-with-lstm-layer-fails#answer_1375767

Hello,

I see a couple of things wrong with the current architecture (could be more):

1) When you use the lstm layer, the input layer should be a sequence layer, not a feature input layer

2) The lstm layer should be used both for the actor and for the critic.

I think the easiest way for you to figure out a correct architecture is to use the default agent feature initially. You can then take the generated architecture and fine-tune it for your specific applications. See for example here. Make sure to specify that you want an rnn network in the agent initialization options.

Hope that helps

5 个评论
显示 3更早的评论隐藏 3更早的评论

Vasu Sharma 2024-1-26

在 MATLAB Online 中打开

Hi @Emmanouil Tzorakoleftherakis!

Thanks for the answer!

I am trying to replicate the performance by a simple feed forward network now. But still run into the same error.

Could you please comment?

TIA :)

%% H2DF DDPG Trainer
%
% Author- Vasu Sharma
% clc
% clear all
% close all
ObsInfo.Name = "Engine Outputs";
ObsInfo.Description = ' IMEP, NOX, SOOT, MPRR ';
%% Creating envirement
obsInfo = rlNumericSpec([16 1],...
 'LowerLimit',[-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf]',...
 'UpperLimit',[inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf]');
obsInfo.Name = "Engine Outputs";
obsInfo.Description = ' IMEP, NOX, SOOT, MPRR, IMEP_t-1,IMEP_ref,IMEP_ref_t-1, IMEP_error, states';
numObservations = obsInfo.Dimension(1);
actInfo = rlNumericSpec([4 1],'LowerLimit',[0.17e-3;440;-1;1e-3],'UpperLimit',[0.5e-3;440;0;5.5e-3]);
actInfo.Name = "Engine Inputs";
actInfo.Description = 'DOI, P2M, SOI, DOI_H2';
numActions = actInfo.Dimension(1);
env = rlSimulinkEnv('MPC_RL_H2DF','MPC_RL_H2DF/RL Agent',...
 obsInfo,actInfo);
env.ResetFcn = @(in)localResetFcn(in);
Ts = 0.08;
Tf = 20;
% 375 engine cycle results
rng(0)
% 1200 - 0.1| 1900: 0.06
%% Createing Agent
L = 60; % number of neurons
statePath = [
    sequenceInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
    fullyConnectedLayer(L, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(L, 'Name', 'fc11')
    reluLayer('Name', 'relu11')
    fullyConnectedLayer(L, 'Name', 'fc12')
    reluLayer('Name', 'relu12')
    fullyConnectedLayer(L, 'Name', 'fc15')
    reluLayer('Name', 'relu15')
    fullyConnectedLayer(L, 'Name', 'fc2')
    additionLayer(2,'Name','add')
    reluLayer('Name','relu2')
    fullyConnectedLayer(L, 'Name', 'fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(L, 'Name', 'fc18')
    reluLayer('Name','relu18')
    fullyConnectedLayer(L, 'Name', 'fc19')
    reluLayer('Name','relu19')
    fullyConnectedLayer(1, 'Name', 'fc4')];
actionPath = [
    sequenceInputLayer(numActions, 'Normalization', 'none', 'Name', 'action')
    fullyConnectedLayer(L, 'Name', 'fc6')
    reluLayer('Name','relu6')
    fullyConnectedLayer(L, 'Name', 'fc13')
    reluLayer('Name','relu13')
    fullyConnectedLayer(L, 'Name', 'fc14')
    reluLayer('Name','relu14')
    fullyConnectedLayer(L, 'Name', 'fc5')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
    
criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
figure
plot(criticNetwork)
criticOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
 'Observation',{'observation'},'Action',{'action'},criticOptions);
%%
actorNetwork = [
    sequenceInputLayer(numObservations, 'Normalization', 'none', 'Name', 'observation')
    fullyConnectedLayer(L, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(L, 'Name', 'fc2')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(L, 'Name', 'fc3')
    reluLayer('Name', 'relu3')
    fullyConnectedLayer(L, 'Name', 'fc9')
    reluLayer('Name', 'relu9')
    fullyConnectedLayer(L, 'Name', 'fc10')
    reluLayer('Name', 'relu10')
    fullyConnectedLayer(numActions, 'Name', 'fc4')
    tanhLayer('Name','tanh1')
    scalingLayer('Name','ActorScaling1','Scale',-(actInfo.UpperLimit-actInfo.LowerLimit)/2,'Bias',(actInfo.UpperLimit+actInfo.LowerLimit)/2)];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,...
 'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
%% Deep Deterministic Policy Gradient (DDPG) agent
agentOpts = rlDDPGAgentOptions(...
 'SampleTime',Ts,...
 'TargetSmoothFactor',1e-3,...
 'DiscountFactor',0.99, ...,
 'MiniBatchSize',256, ...
 'SequenceLength',32,...
 'ExperienceBufferLength',1e5, ...
 'TargetUpdateFrequency', 10);
% agentOpts.NoiseOptions.Variance =
% [0.005*(70/sqrt(Ts));0.005*(12/sqrt(Ts));0.005*(0.4/sqrt(Ts))] v01
agentOpts.NoiseOptions.Variance = [1e-5*0.0025*(80/sqrt(Ts));1e2*0.003*(12/sqrt(Ts));30*0.003*(0.4/sqrt(Ts));1e-1*0.00025*(0.4/sqrt(Ts))];
agentOpts.NoiseOptions.Variance =20*[1.65000000000000e-05;0;0;0.000225000000000000];
agentOpts.NoiseOptions.VarianceDecayRate = [1e-5;1e-5;1e-5;1e-5];
criticOptions.UseDevice = "gpu";
actorOptions.UseDevice = "gpu";
% agent = rlDDPGAgent(actor,critic,agentOpts);
% variance*ts^2 = (0.01 - 0.1)*(action range)
% At each sample time step, the noise model is updated using the following formula, where Ts is the agent sample time.
% 
% x(k) = x(k-1) + MeanAttractionConstant.*(Mean - x(k-1)).*Ts
%        + Variance.*randn(size(Mean)).*sqrt(Ts)
% At each sample time step, the variance decays as shown in the following code.
% 
% decayedVariance = Variance.*(1 - VarianceDecayRate);
% Variance = max(decayedVariance,VarianceMin);
% For continuous action signals, it is important to set the noise variance appropriately to encourage exploration. It is common to have Variance*sqrt(Ts) be between 1% and 10% of your action range.
% 
% If your agent converges on local optima too quickly, promote agent exploration by increasing the amount of noise; that is, by increasing the variance. Also, to increase exploration, you can reduce the VarianceDecayRate.
%% Training agent
maxepisodes = 10000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
 'MaxEpisodes',maxepisodes, ...
 'MaxStepsPerEpisode',maxsteps, ...
 'ScoreAveragingWindowLength',100, ...
 'Verbose',true, ...
 'UseParallel',false,...
 'Plots','training-progress',...
 'StopTrainingCriteria','AverageReward',...
 'StopTrainingValue',0,...
 'SaveAgentCriteria','EpisodeReward','SaveAgentValue',-0.1');
%%
% % Set to true, to resume training from a saved agent
 resumeTraining = false;
% % Set ResetExperienceBufferBeforeTraining to false to keep experience from the previous session
 agentOpts.ResetExperienceBufferBeforeTraining = ~(resumeTraining);
if resumeTraining
    % Load the agent from the previous session
    sprintf('- Resume training of: %s', 'agentV04.mat');   
    trained_agent = load('D:\Masters\HiWi\h2dfannbasedmpc\acados_implementation\rl\savedAgents\Agent253.mat');
    agent = trained_agent.saved_agent ;
else
    % Create a fresh new agent
    agent = rlDDPGAgent(actor, critic, agentOpts);
end
% agent = rlDDPGAgent(actor, critic, agentOpts);
% agent = rlDDPGAgent(actor,critic,agentOpts);
%% Train the agent
trainingStats = train(agent, env, trainOpts);
%trainingStats = train(agent,env,trainOpts);
% get the agent's actor, which predicts next action given the current observation
actor       = getActor(agent);
% get the actor's parameters (neural network weights)
%actorParams = getLearnableParameterValues(actor);

Error Message:

Error using rl.train.SeriesTrainer/run
There was an error executing the ProcessExperienceFcn for block "MPC_RL_H2DF/RL Agent".
Caused by:
	Error using rl.function.AbstractFunction/evaluate
	Unable to evaluate function model.
	Error in rl.function.rlContinuousDeterministicActor/getAction_ (line 32)
	            [action, state] = evaluate(this, observation);
	Error in rl.function.AbstractActorFunction/getAction (line 79)
	            [action, state] = getAction_(this, observation);
	Error in rl.policy.rlAdditiveNoisePolicy/getAction_ (line 129)
	            [action,state] = getAction(this.Actor,observation);
	Error in rl.policy.PolicyInterface/getAction (line 36)
	            [action,this] = getAction_(this,observation);
	Error in rl.agent.AbstractOffPolicyAgent/getExplorationAction_ (line 116)
	            [action,this.ExplorationPolicy_] = getAction(this.ExplorationPolicy_,...
	Error in rl.agent.AbstractAgent/getAction_ (line 90)
	                [action,this] = getExplorationAction_(this,observation);
	Error in rl.policy.PolicyInterface/getAction (line 36)
	            [action,this] = getAction_(this,observation);
	Error in rl.env.internal.PolicyExperienceProcessorInterface/evaluateAction_ (line 32)
	            [action,this.Policy_] = getAction(this.Policy_,observation);
	Error in rl.env.internal.ExperienceProcessorInterface/evaluateAction (line 62)
	                action = evaluateAction_(this,observation);
	Error in rl.simulink.blocks.PolicyProcessExperience/stepImpl (line 56)
	                    act_sig = evaluateAction(this.ExperienceProcessor_,experience.NextObservation);
	Error in Simulink.Simulation.internal.DesktopSimHelper
	Error in Simulink.Simulation.internal.DesktopSimHelper.sim
	Error in Simulink.SimulationInput/sim
	Error in rl.env.internal.SimulinkSimulator>localSim (line 259)
	    simout = sim(in);
	Error in rl.env.internal.SimulinkSimulator>@(in)localSim(in,simPkg) (line 171)
	            simfcn = @(in) localSim(in,simPkg);
	Error in MultiSim.internal.runSingleSim
	Error in MultiSim.internal.SimulationRunnerSerial/executeImplSingle
	Error in MultiSim.internal.SimulationRunnerSerial/executeImpl
	Error in Simulink.SimulationManager/executeSims
	Error in Simulink.SimulationManagerEngine/executeSims
	Error in rl.env.internal.SimulinkSimulator/simInternal_ (line 172)
	            simInfo = executeSims(engine,simfcn,getSimulationInput(this));
	Error in rl.env.internal.SimulinkSimulator/sim_ (line 78)
	                out = simInternal_(this,simPkg);
	Error in rl.env.internal.AbstractSimulator/sim (line 30)
	            out = sim_(this,simData,policy,processExpFcn,processExpData);
	Error in rl.env.AbstractEnv/runEpisode (line 144)
	    out = sim(simulator,simData,policy,processExpFcn,processExpData);
	Error in rl.train.SeriesTrainer/run (line 59)
	                out = runEpisode(...
	Error in rl.train.TrainingManager/train (line 479)
	            run(trainer);
	Error in rl.train.TrainingManager/run (line 233)
	            train(this);
	Error in rl.agent.AbstractAgent/train (line 136)
	trainingResult = run(trainMgr,checkpoint);
	Error in MPC_RL_lstm_run_pureR_H2DF (line 180)
	trainingStats = train(agent, env, trainOpts);
	Caused by:
	    Brace indexing is not supported for variables of this type.
Error in rl.train.TrainingManager/train (line 479)
            run(trainer);
Error in rl.train.TrainingManager/run (line 233)
            train(this);
Error in rl.agent.AbstractAgent/train (line 136)
trainingResult = run(trainMgr,checkpoint);
Error in MPC_RL_lstm_run_pureR_H2DF (line 180)
trainingStats = train(agent, env, trainOpts);

Vasu Sharma 2024-1-31

编辑：Vasu Sharma 2024-1-31

Hi @Emmanouil Tzorakoleftherakis:

Thanks a lot for the previous inputs, I understood the mistakes I was making.

I have a more conceptual question to my problem. I am trying to learn an Engine control model with a DDPG agent, whee I have an LSTM Model for my Engine as a plant.

I am trying to train the DDPG agent by asking it to follow a reference load trajectory as below ( dashed line in top left graph ). I have observed that despite trying various network architectures/noise options & learning rates, the learnt model agent chooses to just deliver a constant load of around 6 ( orange line in the top left graph), rather than follow the given refernece trajectory. The outputs seem to vary reasonably ( here in blue ) but the learning is still not acceptable.

I am tweaking the trajectory every episode to aid learning as then it can see varios load profiles.

Could you kindly advise what might be going on here?

Additional Information: The same effect happens if I ask the controller to match a constant load trajectory ( constnat per episode, then changes to another random constant for the next episode ).

Thanks in advance :)

Best,

Vasu

Emmanouil Tzorakoleftherakis 2024-1-31

Can you please post this as a separate question? Nested questions are not easy to discover.

Thanks

Vasu Sharma 2024-1-31

Very Sorry ! I have posted this again at:

https://de.mathworks.com/matlabcentral/answers/2076771-rl-agent-learns-a-constant-trajectory-instead-of-actual-trajectory

Thanks in advance! :)

请先登录，再进行评论。

Answer 2

Gagan Agarwal 2023-12-21

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2057379-ddpg-with-lstm-layer-fails#answer_1375447

Hi Vasu,

I understand that you are encountering the error while implementing the provided code in MATLAB. LSTM layers are supported in DDPG network of MATLAB.

To address the error, consider the following suggestions:

Confirm if the Reinforcement Learning toolbox is installed in your MATLAB environment.
Review the training options and agent options to ensure they are configured correctly.
Verify that the observation and action specifications align with the input requirements of your network architectures.

I hope it helps!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

DDPG with LSTM layer fails?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

5 个评论
显示 3更早的评论隐藏 3更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

DDPG with LSTM layer fails?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

5 个评论 显示 3更早的评论隐藏 3更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论