MBPO silently converts actions from cell to double, then creates errors when actions aren't given as cell
4 次查看(过去 30 天)
显示 更早的评论
I'm attempting to create an MBPO model to solve a problem (files attached, mbpo.m is the file to run) and I'm getting a strange error I don't know how to fix.
Running my code produces the error:
Error using cell
Size inputs must be integers.
Error using rl.internal.train.MBPOAgentSeriesTrainer/run_internal_/nestedRunEpisode (line 371)
There was an error executing the environment's step method.
Caused by:
Error using rl.internal.function.ITransitionFunction/predict (line 19)
Invalid argument at position 3. Value must be of type cell or be convertible to cell.
Error in rl.env.rlNeuralNetworkEnvironment/step (line 65)
nextObservation = predict(this.TransitionFcn(this.TransitionModelNum),this.Observation, action);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.MATLABEnvironment>@(a)step(env,a) (line 89)
stepfcn = @(a) step(env,a);
^^^^^^^^^^^
Error in rl.env.internal.MATLABFunctionHandleSimulator/step_ (line 22)
[next_observation,reward,isdone] = feval(this.StepFcn_,action);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.internal.MATLABSimulator/step (line 15)
[next_observation,reward,isdone] = step_(this,action);
^^^^^^^^^^^^^^^^^^
Error in rl.env.internal.MATLABSimulator/simInternal_ (line 113)
[nobs,rwd,isd] = step(this,act);
^^^^^^^^^^^^^^
Error in rl.env.internal.MATLABSimulator/sim_ (line 67)
out = simInternal_(this,simPkg);
^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.internal.AbstractSimulator/sim (line 30)
out = sim_(this,simData,policy,processExpFcn,processExpData);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.env.AbstractEnv/runEpisode (line 144)
out = sim(simulator,simData,policy,processExpFcn,processExpData);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_/nestedRunEpisode (line 371)
out_or_F = runEpisode(env,p,...
^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_ (line 447)
out = nestedRunEpisode(policy);
^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_ (line 39)
result = run_internal_(this);
^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.Trainer/run (line 8)
result = run_(this);
^^^^^^^^^^
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
trainResult = run(trainer);
^^^^^^^^^^^^
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
result = run_(this);
^^^^^^^^^^
Error in rl.agent.AbstractAgent/train (line 86)
trainingResult = run(tm);
^^^^^^^
Error in mbpo (line 102)
trainingStats = train(agent,generativeEnv);
^^^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_internal_ (line 447)
out = nestedRunEpisode(policy);
^^^^^^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.MBPOAgentSeriesTrainer/run_ (line 39)
result = run_internal_(this);
^^^^^^^^^^^^^^^^^^^
Error in rl.internal.train.Trainer/run (line 8)
result = run_(this);
^^^^^^^^^^
Error in rl.internal.trainmgr.OnlineTrainingManager/run_ (line 123)
trainResult = run(trainer);
^^^^^^^^^^^^
Error in rl.internal.trainmgr.TrainingManager/run (line 4)
result = run_(this);
^^^^^^^^^^
Error in rl.agent.AbstractAgent/train (line 86)
trainingResult = run(tm);
^^^^^^^
Error in mbpo (line 102)
trainingStats = train(agent,generativeEnv);
^^^^^^^^^^^^^^^^^^^^^^^^^^
Digging through the stack trace, I'm fine until the MATLABSimulator.step call. The function calling this (MATLABSimulator.simInternal_) has the action as a cell array, but step runs:
if iscell(action) && isscalar(action)
action = action{1};
end
which converts the action to an array of doubles. Nothing else operates on the action until ITransitionFunction.predict, which checks if the action is a cell (and will crash because it isn't).
My question is did I do something wrong with my transition functions? I basically just lifted them straight from the Cart-Pole MBPO example. My code is attached below, apologies in advance for the lack of comments on the mbpo file itself, I was just intending to use this as a proof of concept before building the code in a more systematic way.
0 个评论
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Deep Learning Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!