rlTrainingOptions
Options for training reinforcement learning agents
Description
Use an rlTrainingOptions
object to specify training options for an agent. To train an agent, use train
.
For more information on training agents, see Train Reinforcement Learning Agents.
Creation
Description
returns the
default options for training a reinforcement learning agent. Use training options to
specify parameters for the training session, such as the maximum number of episodes to
train, criteria for stopping training, criteria for saving agents, and options for using
parallel computing. After configuring the options, use trainOpts
= rlTrainingOptionstrainOpts
as
an input argument for train
.
creates a training option set and sets object Properties using one or more
name-value pair arguments.opt
= rlTrainingOptions(Name,Value
)
Properties
MaxEpisodes
— Maximum number of episodes to train the agent
500
(default) | positive integer
Maximum number of episodes to train the agent, specified as a positive integer.
Regardless of other criteria for termination, training terminates after
MaxEpisodes
.
Example: 'MaxEpisodes',1000
MaxStepsPerEpisode
— Maximum number of steps to run per episode
500
(default) | positive integer
Maximum number of steps to run per episode, specified as a positive integer. In general, you define episode termination conditions in the environment. This value is the maximum number of steps to run in the episode if other termination conditions are not met.
Example: 'MaxStepsPerEpisode',1000
ScoreAveragingWindowLength
— Window length for averaging
5
(default) | positive integer scalar | positive integer vector
Window length for averaging the scores, rewards, and number of steps for each agent, specified as a scalar or vector.
If the training environment contains a single agent, specify
ScoreAveragingWindowLength
as a scalar.
If the training environment is a multi-agent Simulink® environment, specify a scalar to apply the same window length to all agents.
To use a different window length for each agent, specify
ScoreAveragingWindowLength
as a vector. In this case, the order
of the elements in the vector correspond to the order of the agents used during
environment creation.
For options expressed in terms of averages,
ScoreAveragingWindowLength
is the number of episodes included in
the average. For instance, if StopTrainingCriteria
is
"AverageReward"
, and StopTrainingValue
is
500
for a given agent, then for that agent, training terminates
when the average reward over the number of episodes specified in
ScoreAveragingWindowLength
equals or exceeds
500
. For the other agents, training continues until:
All agents reach their stop criteria.
The number of episodes reaches
MaxEpisodes
.You stop training by clicking the Stop Training button in Episode Manager or pressing Ctrl-C at the MATLAB® command line.
Example: 'ScoreAveragingWindowLength',10
StopTrainingCriteria
— Training termination condition
"AverageSteps"
(default) | "AverageReward"
| "EpisodeCount"
| ...
Training termination condition, specified as one of the following strings:
"AverageSteps"
— Stop training when the running average number of steps per episode equals or exceeds the critical value specified by the optionStopTrainingValue
. The average is computed using the window'ScoreAveragingWindowLength'
."AverageReward"
— Stop training when the running average reward equals or exceeds the critical value."EpisodeReward"
— Stop training when the reward in the current episode equals or exceeds the critical value."GlobalStepCount"
— Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value."EpisodeCount"
— Stop training when the number of training episodes equals or exceeds the critical value.
Example: 'StopTrainingCriteria',"AverageReward"
StopTrainingValue
— Critical value of training termination condition
500
(default) | scalar | vector
Critical value of the training termination condition, specified as a scalar or a vector.
If the training environment contains a single agent, specify
StopTrainingValue
as a scalar.
If the training environment is a multi-agent Simulink environment, specify a scalar to apply the same termination criterion to
all agents. To use a different termination criterion for each agent, specify
StopTrainingValue
as a vector. In this case, the order of the
elements in the vector corresponds to the order of the agents used during environment
creation.
For a given agent, training ends when the termination condition specified by the
StopTrainingCriteria
option equals or exceeds this value. For the
other agents, the training continues until:
All agents reach their stop criteria.
The number of episodes reaches
maxEpisodes
.You stop training by clicking the Stop Training button in Episode Manager or pressing Ctrl-C at the MATLAB command line.
For instance, if StopTrainingCriteria
is
"AverageReward"
, and StopTrainingValue
is
100
for a given agent, then for that agent, training terminates
when the average reward over the number of episodes specified in
ScoreAveragingWindowLength
equals or exceeds
100
.
Example: 'StopTrainingValue',100
SaveAgentCriteria
— Condition for saving agents during training
"none"
(default) | "EpisodeReward"
| "AverageReward"
| "EpisodeCount"
| ...
Condition for saving agents during training, specified as one of the following strings:
"none"
— Do not save any agents during training."EpisodeReward"
— Save the agent when the reward in the current episode equals or exceeds the critical value."AverageSteps"
— Save the agent when the running average number of steps per episode equals or exceeds the critical value specified by the optionStopTrainingValue
. The average is computed using the window'ScoreAveragingWindowLength'
."AverageReward"
— Save the agent when the running average reward over all episodes equals or exceeds the critical value."GlobalStepCount"
— Save the agent when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value."EpisodeCount"
— Save the agent when the number of training episodes equals or exceeds the critical value.
Set this option to store candidate agents that perform well according to the
criteria you specify. When you set this option to a value other than
"none"
, the software sets the SaveAgentValue
option to 500. You can change that value to specify the condition for saving the agent.
For instance, suppose you want to store for further testing any agent that yields an
episode reward that equals or exceeds 100. To do so, set
SaveAgentCriteria
to "EpisodeReward"
and set the
SaveAgentValue
option to 100. When an episode reward equals or
exceeds 100, train
saves the corresponding agent in a MAT file in
the folder specified by the SaveAgentDirectory
option. The MAT file
is called AgentK.mat
, where K
is the number of the
corresponding episode. The agent is stored within that MAT file as
saved_agent
.
Example: 'SaveAgentCriteria',"EpisodeReward"
SaveAgentValue
— Critical value of condition for saving agents
"none"
(default) | 500 | scalar | vector
Critical value of the condition for saving agents, specified as a scalar or a vector.
If the training environment contains a single agent, specify
SaveAgentValue
as a scalar.
If the training environment is a multi-agent Simulink environment, specify a scalar to apply the same saving criterion to each
agent. To save the agents when one meets a particular criterion, specify
SaveAgentValue
as a vector. In this case, the order of the
elements in the vector corresponds to the order of the agents used when creating the
environment. When a criteria for saving an agent is met, all agents are saved in the
same MAT file.
When you specify a condition for saving candidate agents using
SaveAgentCriteria
, the software sets this value to 500. Change
the value to specify the condition for saving the agent. See the
SaveAgentCriteria
option for more details.
Example: 'SaveAgentValue',100
SaveAgentDirectory
— Folder for saved agents
"savedAgents"
(default) | string | character vector
Folder for saved agents, specified as a string or character vector. The folder name
can contain a full or relative path. When an episode occurs that satisfies the condition
specified by the SaveAgentCriteria
and
SaveAgentValue
options, the software saves the agents in a MAT file
in this folder. If the folder does not exist, train
creates it.
When SaveAgentCriteria
is "none"
, this option is
ignored and train
does not create a folder.
Example: 'SaveAgentDirectory', pwd + "\run1\Agents"
UseParallel
— Flag for using parallel training
false
(default) | true
Flag for using parallel training, specified as a logical
. Setting
this option to true
configures training to use parallel processing to
simulate the environment, thereby enabling usage of multiple cores, processors, computer
clusters or cloud resources to speed up training. To specify options for parallel
training, use the ParallelizationOptions
property.
When UseParallel
is true
then for DQN, DDPG,
TD3, and SAC the NumStepsToLookAhead
property or the corresponding
agent option object must be set to 1
, otherwise an error is
generated. This guarantees that experiences are stored contiguously. When AC agents are
trained in parallel, a warning is generated if the
StepsUntilDataIsSent
property of the
ParallelizationOptions
object is set to a different value than
the NumStepToLookAhead
property of the AC agent option
object.
Note that if you want to speed up deep neural network calculations (such as gradient
computation, parameter update and prediction) using a local GPU, you do not need to set
UseParallel
to true. Instead, when creating your actor or critic
representation, use an rlRepresentationOptions
object in which the UseDevice
option is set to "gpu"
. Using parallel computing or the GPU requires
Parallel Computing Toolbox™ software. Using computer clusters or cloud resources additionally requires
MATLAB
Parallel Server™. For more information about training using multicore processors and GPUs,
see Train Agents Using Parallel Computing and GPUs.
Example: 'UseParallel',true
ParallelizationOptions
— Options to control parallel training
ParallelTraining
object
Parallelization options to control parallel training, specified as a
ParallelTraining
object. For more information about training using
parallel computing, see Train Reinforcement Learning Agents.
The ParallelTraining
object has the following properties, which you
can modify using dot notation after creating the rlTrainingOptions
object.
Mode
— Parallel computing mode
"sync"
(default) | "async"
Parallel computing mode, specified as one of the following:
"sync"
— Useparpool
to run synchronous training on the available workers. In this case, workers pause execution until all workers are finished. The host updates the actor and critic parameters based on the results from all the workers and sends the updated parameters to all workers. Note that synchronous training is required for gradient-based parallelization, that is whenDataToSendFromWorkers
is set to"gradients"
thenMode
must be set to"sync"
."async"
— Useparpool
to run asynchronous training on the available workers. In this case, workers send their data back to the host as soon as they finish and receive updated parameters from the host. The workers then continue with their task.
DataToSendFromWorkers
— Type of data that workers send to the host
"experiences"
(default) | "gradients"
Type of data that workers send to the host, specified as one of the following strings:
"experiences"
— The simulation is performed by the workers, and the learning is performed by the host. Specifically, the workers simulate the agent against the environment, and send experience data (observation, action, reward, next observation, and a flag indicating whether a terminal condition has been reached) to the host. For agents with gradients, the host computes gradients from the experiences, updates the network parameters and sends back the updated parameters to the workers to they can perform a new simulation against the environment."gradients"
— Both simulation and learning are performed by the workers. Specifically, the workers simulate the agent against the environment, compute the gradients from experiences, and send the gradients to the host. The host averages the gradients, updates the network parameters and sends back the updated parameters to the workers to they can perform a new simulation against the environment. This option requires synchronous training, that is it requiresMode
to be set to"sync"
.
Note
For AC and PG agents, you must specify
DataToSendFromWorkers
as "gradients"
.
For DQN, DDPG, PPO, TD3, and SAC agents, you must specify
DataToSendFromWorkers
as
"experiences"
.
StepsUntilDataIsSent
— Number of steps after which workers send data to the host
–1
(default) | positive integer
Number of steps after which workers send data to the host and receive updated
parameters, specified as –1
or a positive integer. When this
option is –1
, the worker waits until the end of the episode and
then sends all step data to the host. Otherwise, the worker waits the specified
number of steps before sending data.
Note
AC agents do not accept
StepsUntilDataIsSent = -1
. For AC training, setStepsUntilDataIsSent
equal to theNumStepToLookAhead
AC agent option.For PG agents, you must specify
StepsUntilDataIsSent = -1
.
WorkerRandomSeeds
— Randomizer initialization for workers
–1
(default) | –2
| vector
Randomizer initialization for workers, specified as one of the following:
–1
— Assign a unique random seed to each worker. The value of the seed is the worker ID.–2
— Do not assign a random seed to the workers.Vector — Manually specify the random seed for each worker. The number of elements in the vector must match the number of workers.
TransferBaseWorkspaceVariables
— Option to send model and workspace variables to parallel workers
"on"
(default) | "off"
Option to send model and workspace variables to parallel workers, specified as
"on"
or "off"
. When the option is
"on"
, the host sends variables used in models and defined in
the base MATLAB workspace to the workers.
AttachedFiles
— Additional files to attach to the parallel pool
[]
(default) | string | string array
Additional files to attach to the parallel pool, specified as a string or string array.
SetupFcn
— Function to run before training starts
[]
(default) | function handle
Function to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.
CleanupFcn
— Function to run after training ends
[]
(default) | function handle
Function to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.
Verbose
— Display training progress on the command line
false
(0) (default) | true
(1)
Display training progress on the command line, specified as the logical values
false
(0) or true
(1). Set to
true
to write information from each training episode to the
MATLAB command line during training.
StopOnError
— Option to stop training when error occurs
"on"
(default) | "off"
Option to stop training when an error occurs during an episode, specified as
"on"
or "off"
. When this option is
"off"
, errors are captured and returned in the
SimulationInfo
output of train
, and training
continues to the next episode.
Plots
— Option to display training progress with Episode Manager
"training-progress"
(default) | "none"
Option to display training progress with Episode Manager, specified as
"training-progress"
or "none"
. By default,
calling train
opens the Reinforcement Learning Episode Manager,
which graphically and numerically displays information about the training progress, such
as the reward for each episode, average reward, number of episodes, and total number of
steps. (For more information, see train
.) To turn
off this display, set this option to "none"
.
Object Functions
train | Train reinforcement learning agents within a specified environment |
Examples
Configure Options for Training
Create an options set for training a reinforcement learning agent. Set the maximum number of episodes and the maximum number of steps per episode to 1000. Configure the options to stop training when the average reward equals or exceeds 480, and turn on both the command-line display and Reinforcement Learning Episode Manager for displaying training results. You can set the options using name-value pair arguments when you create the options set. Any options that you do not explicitly set have their default values.
trainOpts = rlTrainingOptions(... 'MaxEpisodes',1000,... 'MaxStepsPerEpisode',1000,... 'StopTrainingCriteria',"AverageReward",... 'StopTrainingValue',480,... 'Verbose',true,... 'Plots',"training-progress")
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: 5 StopTrainingCriteria: "AverageReward" StopTrainingValue: 480 SaveAgentCriteria: "none" SaveAgentValue: "none" SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" StopOnError: "on" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
Alternatively, create a default options set and use dot notation to change some of the values.
trainOpts = rlTrainingOptions; trainOpts.MaxEpisodes = 1000; trainOpts.MaxStepsPerEpisode = 1000; trainOpts.StopTrainingCriteria = "AverageReward"; trainOpts.StopTrainingValue = 480; trainOpts.Verbose = true; trainOpts.Plots = "training-progress"; trainOpts
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: 5 StopTrainingCriteria: "AverageReward" StopTrainingValue: 480 SaveAgentCriteria: "none" SaveAgentValue: "none" SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" StopOnError: "on" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
You can now use trainOpts
as an input argument to the train
command.
Configure Parallel Computing Options for Training
To turn on parallel computing for training a reinforcement learning agent, set the UseParallel
training option to true
.
trainOpts = rlTrainingOptions('UseParallel',true);
To configure your parallel training, configure the fields of the trainOpts.ParallelizationOptions
. For example, specify the following training options:
Asynchronous mode
Workers send data to the host every 100 steps within a training episode
Workers compute and send gradients to the host
trainOpts.ParallelizationOptions.Mode = "async"; trainOpts.ParallelizationOptions.StepsUntilDataIsSent = 100; trainOpts.ParallelizationOptions.DataToSendFromWorkers = "Gradients"; trainOpts.ParallelizationOptions
ans = ParallelTraining with properties: Mode: "async" WorkerRandomSeeds: -1 TransferBaseWorkspaceVariables: "on" AttachedFiles: [] SetupFcn: [] CleanupFcn: []
You can now use trainOpts
as an input argument to the train
command to perform training with parallel computing.
Configure Options for Training a Multi-Agent Environment
Create an options object for concurrently training three agents in the same environment.
Set the maximum number of episodes and the maximum steps per episode to 1000
. Configure the options to stop training the first agent when its average reward over 5 episodes equals or exceeds 400, the second agent when its average reward over 10
episodes equals or exceeds 500, and the third when its average reward over 15 episodes equals or exceeds 600. The order of agents is the one used during environment creation.
Save the agents when the reward for the first agent in the current episode exceeds 100, or when the reward for the second agent exceeds 120, the reward for the third agent equals or exceeds 140.
Turn on both the command-line display and Reinforcement Learning Episode Manager for displaying training results. You can set the options using name-value pair arguments when you create the options set. Any options that you do not explicitly set have their default values.
trainOpts = rlTrainingOptions(... 'MaxEpisodes',1000,... 'MaxStepsPerEpisode',1000,... 'ScoreAveragingWindowLength',[5 10 15],... 'StopTrainingCriteria',"AverageReward",... 'StopTrainingValue',[400 500 600],... 'SaveAgentCriteria',"EpisodeReward",... 'SaveAgentValue',[100 120 140],... 'Verbose',true,... 'Plots',"training-progress")
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: [5 10 15] StopTrainingCriteria: "AverageReward" StopTrainingValue: [400 500 600] SaveAgentCriteria: "EpisodeReward" SaveAgentValue: [100 120 140] SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" StopOnError: "on" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
Alternatively, create a default options set and use dot notation to change some of the values.
trainOpts = rlTrainingOptions; trainOpts.MaxEpisodes = 1000; trainOpts.MaxStepsPerEpisode = 1000; trainOpts.ScoreAveragingWindowLength = [5 10 15]; trainOpts.StopTrainingCriteria = "AverageReward"; trainOpts.StopTrainingValue = [400 500 600]; trainOpts.SaveAgentCriteria = "EpisodeReward"; trainOpts.SaveAgentValue = [100 120 140]; trainOpts.Verbose = true; trainOpts.Plots = "training-progress"; trainOpts
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: [5 10 15] StopTrainingCriteria: "AverageReward" StopTrainingValue: [400 500 600] SaveAgentCriteria: "EpisodeReward" SaveAgentValue: [100 120 140] SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" StopOnError: "on" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
You can specify a scalar to apply the same criterion to all agents. For example, use a window length of 10
for all three agents.
trainOpts.ScoreAveragingWindowLength = 10
trainOpts = rlTrainingOptions with properties: MaxEpisodes: 1000 MaxStepsPerEpisode: 1000 ScoreAveragingWindowLength: 10 StopTrainingCriteria: "AverageReward" StopTrainingValue: [400 500 600] SaveAgentCriteria: "EpisodeReward" SaveAgentValue: [100 120 140] SaveAgentDirectory: "savedAgents" Verbose: 1 Plots: "training-progress" StopOnError: "on" UseParallel: 0 ParallelizationOptions: [1x1 rl.option.ParallelTraining]
You can now use trainOpts
as an input argument to the train
command.
Version History
See Also
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令:
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)