MBPO silently converts actions from cell to double, then creates errors when actions aren't given as cell

Question

Alex B 2025-2-5

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2173659-mbpo-silently-converts-actions-from-cell-to-double-then-creates-errors-when-actions-aren-t-given-as

I'm attempting to create an MBPO model to solve a problem (files attached, mbpo.m is the file to run) and I'm getting a strange error I don't know how to fix.

Running my code produces the error:

Error using cell
Size inputs must be integers.
 
Error using rl.internal.train.MBPOAgentSeriesTrainer/run_internal_/nestedRunEpisode (line 371)
There was an error executing the environment's step method.
Caused by:
	Error using rl.internal.function.ITransitionFunction/predict (line 19)
	Invalid argument at position 3. Value must be of type cell or be convertible to cell.
	Error in rl.env.rlNeuralNetworkEnvironment/step (line 65)
	                nextObservation = predict(this.TransitionFcn(this.TransitionModelNum),this.Observation, action);
	                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Error in rl.env.MATLABEnvironment>@(a)step(env,a) (line 89)
	                stepfcn = @(a) step(env,a);
	                               ^^^^^^^^^^^
	Error in rl.env.internal.MATLABFunctionHandleSimulator/step_ (line 22)
	            [next_observation,reward,isdone] = feval(this.StepFcn_,action);
	                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Error in rl.env.internal.MATLABSimulator/step (line 15)
	            [next_observation,reward,isdone] = step_(this,action);
	                                               ^^^^^^^^^^^^^^^^^^
	Error in rl.env.internal.MATLABSimulator/simInternal_ (line 113)
	                        [nobs,rwd,isd] = step(this,act);
	                                         ^^^^^^^^^^^^^^
	Error in rl.env.internal.MATLABSimulator/sim_ (line 67)
	                out = simInternal_(this,simPkg);
	                      ^^^^^^^^^^^^^^^^^^^^^^^^^
	Error in rl.env.internal.AbstractSimulator/sim (line 30)
	            out = sim_(this,simData,policy,processExpFcn,processExpData);
	                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Error in rl.env.AbstractEnv/runEpisode (line 144)
	    out = sim(simulator,simData,policy,processExpFcn,processExpData);
	          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_/nestedRunEpisode (line 371)
	                out_or_F = runEpisode(env,p,...
	                           ^^^^^^^^^^^^^^^^^^^^
	Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_ (line 447)
	                    out = nestedRunEpisode(policy);
	                          ^^^^^^^^^^^^^^^^^^^^^^^^
	Error in rl.internal.train.MBPOAgentSeriesTrainer/run_ (line 39)
	            result = run_internal_(this);
	                     ^^^^^^^^^^^^^^^^^^^
	Error in rl.internal.train.Trainer/run (line 8)
	            result = run_(this);
	                     ^^^^^^^^^^
	Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
	            trainResult = run(trainer);
	                          ^^^^^^^^^^^^
	Error in rl.internal.trainmgr.TrainingManager/run (line 4)
	            result = run_(this);
	                     ^^^^^^^^^^
	Error in rl.agent.AbstractAgent/train (line 86)
	    trainingResult = run(tm);
	                     ^^^^^^^
	Error in mbpo (line 102)
	trainingStats = train(agent,generativeEnv);
	                ^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_ (line 447)
                    out = nestedRunEpisode(policy);
                          ^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_ (line 39)
            result = run_internal_(this);
                     ^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.Trainer/run (line 8)
            result = run_(this);
                     ^^^^^^^^^^
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
            trainResult = run(trainer);
                          ^^^^^^^^^^^^
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
            result = run_(this);
                     ^^^^^^^^^^
Error in rl.agent.AbstractAgent/train (line 86)
    trainingResult = run(tm);
                     ^^^^^^^
Error in mbpo (line 102)
trainingStats = train(agent,generativeEnv);
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
                

Digging through the stack trace, I'm fine until the MATLABSimulator.step call. The function calling this (MATLABSimulator.simInternal_) has the action as a cell array, but step runs:

if iscell(action) && isscalar(action)
    action = action{1};
end

which converts the action to an array of doubles. Nothing else operates on the action until ITransitionFunction.predict, which checks if the action is a cell (and will crash because it isn't).

My question is did I do something wrong with my transition functions? I basically just lifted them straight from the Cart-Pole MBPO example. My code is attached below, apologies in advance for the lack of comments on the mbpo file itself, I was just intending to use this as a proof of concept before building the code in a more systematic way.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

MBPO silently converts actions from cell to double, then creates errors when actions aren't given as cell

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

MBPO silently converts actions from cell to double, then creates errors when actions aren't given as cell

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论