rlMBPOAgentOptions
Description
Use an rlMBPOAgentOptions
object to specify options for
model-based policy optimization (MBPO) agents. To create an MBPO agent, use rlMBPOAgent
.
For more information, see Model-Based Policy Optimization (MBPO) Agent.
Creation
Description
creates an option
object for use as an argument when creating an MBPO agent using all default options. You
can modify the object properties using dot notation.opt
= rlMBPOAgentOptions
creates the options set opt
= rlMBPOAgentOptions(Name=Value
)opt
and sets its properties using one
or more name-value arguments. For example,
rlMBPOAgentOptions(DiscountFactor=0.95)
creates an option set with a
discount factor of 0.95
. You can specify multiple name-value pair
arguments.
Properties
NumEpochForTrainingModel
— Number of epochs
1
(default) | positive integer
Number of epochs for training the environment model, specified as a positive integer.
Example: NumEpochForTrainingModel=2
NumMiniBatches
— Number of mini-batches
10
(default) | positive integer | "all"
Number of mini-batches used in each environment model training epoch, specified as a
positive scalar or "all"
. When you specify
NumMiniBatches
to "all"
, the agent selects the
number of mini-batches such that all samples in the base agents experience buffer are
used to train the model.
Example: NumMiniBatches=20
MiniBatchSize
— Size of random experience mini-batch
128
(default) | positive integer
Size of random experience mini-batch for training environment model, specified as a positive integer. During each model training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the environment model properties. Large mini-batches reduce the variance when computing gradients but increase the computational effort.
Example: MiniBatchSize=256
TransitionOptimizerOptions
— Transition function optimizer options
rlOptimizerOptions
object | array of rlOptimizerOptions
objects
Transition function optimizer options, specified as one of the following:
rlOptimizerOptions
object — When your neural network environment has a single transition function or if you want to use the same options for multiple transition functions, specify a single options object.Array of
rlOptimizerOptions
objects — When your neural network environment agent has multiple transition functions and you want to use different optimizer options for the transition functions, specify an array of options objects with length equal to the number of transition functions.
Using these objects, you can specify training parameters for the transition deep neural network approximators as well as the optimizer algorithms and parameters.
If you have previously trained transition models and do not want the MBPO agent to
modify these models during training, set
TransitionOptimizerOptions.LearnRate
to
0
.
RewardOptimizerOptions
— Reward function optimizer options
rlOptimizerOptions
object
Reward function optimizer options, specified as an rlOptimizerOptions
object. Using this object, you can specify training
parameters for the reward deep neural network approximator as well as the optimizer
algorithm and its parameters.
If you specify a ground-truth reward function using a custom function, the MBPO agent ignores these options.
If you have a previously trained reward model and do not want the MBPO agent to
modify the model during training, set
RewardOptimizerOptions.LearnRate
to 0
.
IsDoneOptimizerOptions
— Is-done function optimizer options
rlOptimizerOptions
object
Is-done function optimizer options, specified as an rlOptimizerOptions
object. Using this object, you can specify training
parameters for the is-done deep neural network approximator as well as the optimizer
algorithm and its parameters.
If you specify a ground-truth is-done function using a custom function, the MBPO agent ignores these options.
If you have a previously trained is-done model and do not want the MBPO agent to
modify the model during training, set
IsDoneOptimizerOptions.LearnRate
to 0
.
ModelExperienceBufferLength
— Generated experience buffer size
100000
(default) | positive integer
Generated experience buffer size, specified as a positive integer. When the agent generates experiences, they are added to the model experience buffer.
Example: ModelExperienceBufferLength=50000
ModelRolloutOptions
— Model roll-out options
rlModelRolloutOptions
object
Model roll-out options for controlling the number and length of generated experience
trajectories, specified as an rlModelRolloutOptions
object with the
following fields. At the start of each epoch, the agent generates the roll-out
trajectories and adds them to the model experience buffer. To modify the roll-out
options, use dot notation.
NumRollout
— Number of trajectories
2000
(default) | positive integer
Number of trajectories for generating samples, specified as a positive integer.
Example: NumRollout=4000
Horizon
— Initial trajectory horizon
1
(default) | positive integer
Initial trajectory horizon, specified as a positive integer.
Example: Horizon=2
HorizonUpdateSchedule
— Option for increasing horizon length
"none"
(default) | "piecewise"
Option for increasing the horizon length, specified as one of the following values.
"none"
— Do not increase the horizon length."piecewise"
— Increase the horizon length by one after every N model training epochs, where N is equal toHorizonUpdateFrequency
.
Example: HorizonUpdateSchedule="piecewise"
RolloutHorizonUpdateFrequency
— Number of epochs after which the horizon increases
100
(default) | positive integer
Number of epochs after which the horizon increases, specified as a positive
integer. When RolloutHorizonSchedule
is
"none"
this option is ignored.
Example: RolloutHorizonUpdateFrequency=200
HorizonMax
— Maximum horizon length
20
(default) | positive integer
Maximum horizon length, specified as a positive integer greater than or equal
to RolloutHorizon
. When
RolloutHorizonSchedule
is "none"
this
option is ignored.
Example: HorizonMax=5
HorizonUpdateStartEpoch
— Training epoch at which to start generating trajectories
1
(default) | positive integer
Training epoch at which to start generating trajectories, specified as a positive integer.
Example: HorizonUpdateStartEpoch=100
NoiseOptions
— Exploration model options
[]
(default) | EpsilonGreedyExploration
object | GaussianActionNoise
object
Exploration model options for generating experiences using the internal environment model, specified as one of the following:
[]
— Use the exploration policy of the base agent. You must use this option when training a SAC base agent.EpsilonGreedyExploration
object — You can use this option when training a DQN base agent.GaussianActionNoise
object — You can use this option when training a DDPG or TD3 base agent.
The exploration model uses only the initial noise option values and does not update the values during training.
To specify NoiseOptions
, create a default model object.
Then, specify any nondefault model properties using dot notation.
Specify epsilon greedy exploration options.
opt = rlMBPOAgentOptions; opt.ModelRolloutOptions.NoiseOptions = ... rl.option.EpsilonGreedyExploration; opt.ModelRolloutOptions.NoiseOptions.EpsilonMin = 0.03;
Specify Gaussian action noise options.
opt = rlMBPOAgentOptions; opt.ModelRolloutOptions.NoiseOptions = ... rl.option.GaussianActionNoise; opt.ModelRolloutOptions.NoiseOptions.StandardDeviation = sqrt(0.15);
For more information on noise models, see Noise Models.
RealSampleRatio
— Ratio of real experiences in a mini-batch
0.2
(default) | nonnegative scalar less than or equal to 1
Ratio of real experiences in a mini-batch for agent training, specified as a nonnegative scalar less than or equal to 1.
Example: RealSampleRatio=0.1
InfoToSave
— Options to save additional agent data
structure (default)
Options to save additional agent data, specified as a structure containing the
Optimizer
field.
You can save an agent object in several ways, for example:
Using the
save
commandSpecifying
saveAgentCriteria
andsaveAgentValue
in anrlTrainingOptions
objectSpecifying an appropriate logging function within a
FileLogger
object.
When you save an agent using any method, the fields in the
InfoToSave
structure determine whether the corresponding data is
saved with the agent. For example, if you set the Optimizer
field
to true
, then the transition, reward, and is-done functions
optimizers are saved along with the agent.
You can modify the InfoToSave
property only after the agent
options object is created.
Example: options.InfoToSave.Optimizer=true
Optimizer
— Option to save agent optimizer
false
(default) | true
Option to save the agent optimizer, specified as a logical value. If the
Optimizer
field is set to false
, then
the transition, reward, and is-done functions optimizers (which are hidden
properties of the agent and can contain internal states) are not saved along with
the agent, therefore saving disk space and memory. However, when the optimizers
contain internal states, the state of the saved agent is not identical to the
state of the original agent.
Example: true
Object Functions
rlMBPOAgent | Model-based policy optimization (MBPO) reinforcement learning agent |
Examples
Create MBPO Agent Options Object
Create an MBPO agent options object, specifying the ratio of real experiences to use for training the agent as 30%.
opt = rlMBPOAgentOptions(RealSampleRatio=0.3)
opt = rlMBPOAgentOptions with properties: NumEpochForTrainingModel: 1 NumMiniBatches: 10 MiniBatchSize: 128 TransitionOptimizerOptions: [1x1 rl.option.rlOptimizerOptions] RewardOptimizerOptions: [1x1 rl.option.rlOptimizerOptions] IsDoneOptimizerOptions: [1x1 rl.option.rlOptimizerOptions] ModelExperienceBufferLength: 100000 ModelRolloutOptions: [1x1 rl.option.rlModelRolloutOptions] RealSampleRatio: 0.3000 InfoToSave: [1x1 struct]
You can modify options using dot notation. For example, set the mini-batch size to 64.
opt.MiniBatchSize = 64;
Algorithms
Noise Models
A GaussianActionNoise
object has the following numeric value
properties. When generating experiences, MBPO agents do not update their exploration model
parameters.
Property | Description | Default Value |
---|---|---|
Mean | Noise model mean | 0 |
StandardDeviation | Noise model standard deviation | sqrt(0.2) |
StandardDeviationDecayRate | Decay rate of the standard deviation (not used for generating samples) | 0 |
StandardDeviationMin | Minimum standard deviation, which must be less than
(not used for generating samples) | 0.1 |
LowerLimit | Noise sample lower limit | -Inf |
UpperLimit | Noise sample upper limit | Inf |
At each time step k
, the Gaussian noise v
is
sampled as shown in the following code.
w = Mean + rand(ActionSize).*StandardDeviation(k); v(k+1) = min(max(w,LowerLimit),UpperLimit);
An EpsilonGreedyExploration
object has the following numeric value
properties. When generating experiences, MBPO agents do not update their exploration model
parameters.
Property | Description | Default Value |
---|---|---|
Epsilon | Probability threshold to either randomly select an action or select the
action that maximizes the state-action value function. A larger value of
Epsilon means that the agent randomly explores the action
space at a higher rate. | 1 |
EpsilonMin | Minimum value of (not used for generating samples) | 0.01 |
EpsilonDecay | Decay rate (not used for generating samples) | 0.005 |
Version History
Introduced in R2022a
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令:
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)