MBPO silently converts actions from cell to double, then creates errors when actions aren't given as cell

4 次查看(过去 30 天)
I'm attempting to create an MBPO model to solve a problem (files attached, mbpo.m is the file to run) and I'm getting a strange error I don't know how to fix.
Running my code produces the error:
Error using cell
Size inputs must be integers.
Error using rl.internal.train.MBPOAgentSeriesTrainer/run_internal_/nestedRunEpisode (line 371)
There was an error executing the environment's step method.
Caused by:
Error using rl.internal.function.ITransitionFunction/predict (line 19)
Invalid argument at position 3. Value must be of type cell or be convertible to cell.
Error in rl.env.rlNeuralNetworkEnvironment/step (line 65)
nextObservation = predict(this.TransitionFcn(this.TransitionModelNum),this.Observation, action);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.MATLABEnvironment>@(a)step(env,a) (line 89)
stepfcn = @(a) step(env,a);
^^^^^^^^^^^
Error in rl.env.internal.MATLABFunctionHandleSimulator/step_ (line 22)
[next_observation,reward,isdone] = feval(this.StepFcn_,action);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.internal.MATLABSimulator/step (line 15)
[next_observation,reward,isdone] = step_(this,action);
^^^^^^^^^^^^^^^^^^
Error in rl.env.internal.MATLABSimulator/simInternal_ (line 113)
[nobs,rwd,isd] = step(this,act);
^^^^^^^^^^^^^^
Error in rl.env.internal.MATLABSimulator/sim_ (line 67)
out = simInternal_(this,simPkg);
^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.internal.AbstractSimulator/sim (line 30)
out = sim_(this,simData,policy,processExpFcn,processExpData);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.AbstractEnv/runEpisode (line 144)
out = sim(simulator,simData,policy,processExpFcn,processExpData);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_/nestedRunEpisode (line 371)
out_or_F = runEpisode(env,p,...
^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_ (line 447)
out = nestedRunEpisode(policy);
^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_ (line 39)
result = run_internal_(this);
^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.Trainer/run (line 8)
result = run_(this);
^^^^^^^^^^
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
trainResult = run(trainer);
^^^^^^^^^^^^
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
result = run_(this);
^^^^^^^^^^
Error in rl.agent.AbstractAgent/train (line 86)
trainingResult = run(tm);
^^^^^^^
Error in mbpo (line 102)
trainingStats = train(agent,generativeEnv);
^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_ (line 447)
out = nestedRunEpisode(policy);
^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_ (line 39)
result = run_internal_(this);
^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.Trainer/run (line 8)
result = run_(this);
^^^^^^^^^^
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
trainResult = run(trainer);
^^^^^^^^^^^^
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
result = run_(this);
^^^^^^^^^^
Error in rl.agent.AbstractAgent/train (line 86)
trainingResult = run(tm);
^^^^^^^
Error in mbpo (line 102)
trainingStats = train(agent,generativeEnv);
^^^^^^^^^^^^^^^^^^^^^^^^^^
Digging through the stack trace, I'm fine until the MATLABSimulator.step call. The function calling this (MATLABSimulator.simInternal_) has the action as a cell array, but step runs:
if iscell(action) && isscalar(action)
action = action{1};
end
which converts the action to an array of doubles. Nothing else operates on the action until ITransitionFunction.predict, which checks if the action is a cell (and will crash because it isn't).
My question is did I do something wrong with my transition functions? I basically just lifted them straight from the Cart-Pole MBPO example. My code is attached below, apologies in advance for the lack of comments on the mbpo file itself, I was just intending to use this as a proof of concept before building the code in a more systematic way.

回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Deep Learning Toolbox 的更多信息

产品


版本

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by