train
Train reinforcement learning agents within a specified environment
Syntax
Description
trains one or more reinforcement learning agents within the environment
trainStats
= train(env
,agents
)env
, using default training options, and returns
training results in trainStats
. Although
agents
is an input argument, after each training
episode, train
updates the parameters of each agent
specified in agents
to maximize their expected long-term
reward from the environment. This is possible because each agent is an handle
object. When training terminates, agents
reflects the state
of each agent at the end of the final training episode.
Note
To train an off-policy agent offline using existing data, use
trainFromData
.
performs the same training as the previous syntax.trainStats
= train(agents
,env
)
trains trainStats
= train(___,trainOpts
)agents
within env
, using the
training options object trainOpts
. Use training options to
specify training parameters such as the criteria for terminating training, when
to save agents, the maximum number of episodes to train, and the maximum number
of steps per episode.
resumes training from the last values of the agent parameters and training
results contained in trainStats
= train(___,prevTrainStats
)prevTrainStats
, which is returned by
the previous function call to train
.
train agents with additional name-value arguments. Use this syntax to specify a
logger or evaluator object to be used in training. Logger and evaluator objects
allow you to periodically log results to disk and to evaluate agents,
respectively.trainStats
= train(___,Name=Value
)
Examples
Input Arguments
Output Arguments
Tips
train
updates the agents as training progresses. To preserve the original agent parameters for later use, save the agents to a MAT file.By default, calling
train
opens the Reinforcement Learning Training Monitor, which lets you visualize the progress of the training. The Reinforcement Learning Training Monitor plot shows the reward for each episode, a running average reward value, and the critic estimate Q0 (for agents that have critics). The Reinforcement Learning Training Monitor also displays various episode and training statistics. To turn off the Reinforcement Learning Training Monitor, set thePlots
option oftrainOpts
to"none"
.If you use a predefined environment for which there is a visualization, you can use
plot(env)
to visualize the environment. If you callplot(env)
before training, then the visualization updates during training to allow you to visualize the progress of each episode. (For custom environments, you must implement your ownplot
method.)Training terminates when the conditions specified in
trainOpts
are satisfied. To terminate training in progress, in the Reinforcement Learning Training Monitor, click Stop Training. Becausetrain
updates the agent at each episode, you can resume training by callingtrain(agent,env,trainOpts)
again, without losing the trained parameters learned during the first call totrain
.During training, you can save candidate agents that meet conditions you specify with
trainOpts
. For instance, you can save any agent whose episode reward exceeds a certain value, even if the overall condition for terminating training is not yet satisfied.train
stores saved agents in a MAT file in the folder you specify withtrainOpts
. Saved agents can be useful, for instance, to allow you to test candidate agents generated during a long-running training process. For details about saving criteria and saving location, seerlTrainingOptions
.
Algorithms
In general, train
performs the following iterative steps:
Initialize
agent
.For each episode:
Reset the environment.
Get the initial observation s0 from the environment.
Compute the initial action a0 = μ(s0).
Set the current action to the initial action (a←a0) and set the current observation to the initial observation (s←s0).
While the episode is not finished or terminated:
Step the environment with action a to obtain the next observation s' and the reward r.
Learn from the experience set (s,a,r,s').
Compute the next action a' = μ(s').
Update the current action with the next action (a←a') and update the current observation with the next observation (s←s').
Break if the episode termination conditions defined in the environment are met.
If the training termination condition defined by
trainOpts
is met, terminate training. Otherwise, begin the next episode.
The specifics of how train
performs these computations depends on
your configuration of the agent and environment. For instance, resetting the environment
at the start of each episode can include randomizing initial state values, if you
configure your environment to do so.