Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

7 次查看(过去 30 天)
Hi,
i implemented some RL agents (DQN, AC, PPO...) successfully with my custom environment function by using a feedforward network like shown in the documentation here. All worked properly but the model did not converge. So i tried to use a LSTM network to see if this would work better in this case. Therefore i made some adjustments to my code following this part of the documentation. The functions are working without any problems and also the episode manager is starting properly. Also if start the reset- and the step functions manually everything looks like it should. But when i run the script, after a short moment i get the error message
>> RL_PPO_LSTM
Error using rl.agent.AbstractPolicy/step (line 116)
Invalid input argument type or size such as observation, reward, isdone or
loggedSignals.
Error in rl.env.MATLABEnvironment/simLoop (line 241)
action = step(policy,observation,reward,isdone);
Error in rl.env.MATLABEnvironment/simWithPolicyImpl (line 106)
[expcell{simCount},epinfo,siminfos{simCount}] =
simLoop(env,policy,opts,simCount,usePCT);
Error in rl.env.AbstractEnv/simWithPolicy (line 70)
[experiences,varargout{1:(nargout-1)}] =
simWithPolicyImpl(this,policy,opts,varargin{:});
Error in rl.task.SeriesTrainTask/runImpl (line 33)
[varargout{1},varargout{2}] =
simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
[varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 159)
[varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 163)
[this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 187)
runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 291)
run(trainer);
Error in rl.train.TrainingManager/run (line 160)
train(this);
Error in rl.agent.AbstractAgent/train (line 54)
TrainingStatistics = run(trainMgr);
Error in RL_PPO_LSTM (line 83)
trainingStats = train(agent,env,trainOpts);
Caused by:
Expected one output from a curly brace or dot indexing expression, but
there were 2 results.
I saw a similar questions here on answers:
and i changed my functions to output row vectors as logged.signal but that did not change anything. I tried to debug this, by setting "pause on error" - but im really lost here.
Thanks for your help!
Stephan

采纳的回答

Stephan
Stephan 2020-7-25
I finally could solve the issue. The problem was that there were 2 LSTM layers in the network, which led to the error:
Caused by:
Expected one output from a curly brace or dot indexing expression, but
there were 2 results.
Removing the second LSTM layer solved the problem.

更多回答(0 个)

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by