Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

Question

Stephan 2020-7-24

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/570067-problems-using-lstm-with-ppo-agent-error-invalid-input-argument-type-or-size-such-as-observation

评论： katuysha 2023-6-5

Hi,

i implemented some RL agents (DQN, AC, PPO...) successfully with my custom environment function by using a feedforward network like shown in the documentation here. All worked properly but the model did not converge. So i tried to use a LSTM network to see if this would work better in this case. Therefore i made some adjustments to my code following this part of the documentation. The functions are working without any problems and also the episode manager is starting properly. Also if start the reset- and the step functions manually everything looks like it should. But when i run the script, after a short moment i get the error message

>> RL_PPO_LSTM
Error using rl.agent.AbstractPolicy/step (line 116)
Invalid input argument type or size such as observation, reward, isdone or
loggedSignals.
Error in rl.env.MATLABEnvironment/simLoop (line 241)
                    action = step(policy,observation,reward,isdone);
Error in rl.env.MATLABEnvironment/simWithPolicyImpl (line 106)
                    [expcell{simCount},epinfo,siminfos{simCount}] =
                    simLoop(env,policy,opts,simCount,usePCT);
Error in rl.env.AbstractEnv/simWithPolicy (line 70)
            [experiences,varargout{1:(nargout-1)}] =
            simWithPolicyImpl(this,policy,opts,varargin{:});
Error in rl.task.SeriesTrainTask/runImpl (line 33)
            [varargout{1},varargout{2}] =
            simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
            [varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 159)
            [varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 163)
            [this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 187)
                runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
                runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
            run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 291)
            run(trainer);
Error in rl.train.TrainingManager/run (line 160)
            train(this);
Error in rl.agent.AbstractAgent/train (line 54)
TrainingStatistics = run(trainMgr);
Error in RL_PPO_LSTM (line 83)
trainingStats = train(agent,env,trainOpts);
Caused by:
    Expected one output from a curly brace or dot indexing expression, but
    there were 2 results.

I saw a similar questions here on answers:

https://de.mathworks.com/matlabcentral/answers/471256-how-to-solve-invalid-input-argument-type-or-size-such-as-observation-reward-isdone-or-loggedsigna

and i changed my functions to output row vectors as logged.signal but that did not change anything. I tried to debug this, by setting "pause on error" - but im really lost here.

Thanks for your help!

Stephan

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Stephan 2020-7-25

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/570067-problems-using-lstm-with-ppo-agent-error-invalid-input-argument-type-or-size-such-as-observation#answer_470791

I finally could solve the issue. The problem was that there were 2 LSTM layers in the network, which led to the error:

Caused by:
    Expected one output from a curly brace or dot indexing expression, but
    there were 2 results.

Removing the second LSTM layer solved the problem.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

katuysha 2023-6-5

can not understand your description, please make it clearer

请先登录，再进行评论。

Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Problems using LSTM with PPO Agent - Error: Invalid input argument type or size such as observation, reward, isdone or loggedSignals.

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

WeChat

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论