Main Content

Specify Training Options in Reinforcement Learning Designer

To configure the training of an agent in the Reinforcement Learning Designer app, specify training options on the Train tab.

The Train tab, showing example training options.

Specify Basic Options

On the Train tab, you can specify the following basic training options.

OptionDescription
Max EpisodesMaximum number of episodes to train the agent, specified as a positive integer.
Max Episode LengthMaximum number of steps to run per episode, specified as a positive integer.
Stopping Criteria

Training termination condition, specified as one of the following values.

  • AverageSteps — Stop training when the running average number of steps per episode equals or exceeds the critical value specified by Stopping Value.

  • AverageReward — Stop training when the running average reward equals or exceeds the critical value.

  • EpisodeReward — Stop training when the reward in the current episode equals or exceeds the critical value.

  • GlobalStepCount — Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

  • EpisodeCount — Stop training when the number of training episodes equals or exceeds the critical value.

Stopping ValueCritical value of the training termination condition in Stopping Criteria, specified as a scalar.
Average Window LengthWindow length for averaging the scores, rewards, and number of steps for the agent when either Stopping Criteria or Save agent criteria specify an averaging condition.

Specify Agent Evaluation Options

To enable agent evaluation at regular intervals during training, on the Train tab, click Agent evaluation icon..

To specify agent evaluation options, select Evaluate Agent > Agent evaluation options.

Agent evaluation options dialog box.

In the Agent Evaluation Options dialog box, you can specify the following training options.

OptionDescription
Enable agent evaluation

Enables periodic agent evaluation during training. This option gets selected also when you click Agent evaluation icon..

Number of evaluation episodes

Number of consecutive evaluation episodes, specified as a positive integer. After the number of consecutive training episodes specified in Evaluation frequency, the software runs the number of evaluation episodes specified in this field, consecutively.

For example, if you specify 100 in the Evaluation frequency field, and 3 in this field, then three evaluation episodes are run, consecutively, after 100 training episodes. These three evaluation episodes are used to calculate a single statistic, specified by the Evaluation statistic type field, which is returned as the 100th element the of training result object instantiated after training. After 200 training episodes, three new evaluation episodes are run, with their statistic returned in the 200th element of the training results object, and so on. The default is value 3.

Evaluation frequency

Evaluation period, specified as a positive integer. It is the number of consecutive training episodes after which the number of consecutive evaluation episodes specified in the Number of evaluation episodes field are run. For example, if you specify 100 in this field and 3 in the Number of evaluation episodes field, three evaluation episodes are run, consecutively, after 100 episodes. The default is value 100.

Max evaluation episode length

Maximum number of steps to run for an evaluation episode, specified as a positive integer. This value is the maximum number of steps to run for an evaluation episode if other termination conditions are not met before. To accurately assess the agent stability and performance, it is often useful to specify a larger number of steps for an evaluation episode, with respect to a training episode.

If you leave this field empty (default), the maximum number of steps per episode specified in the Max Episode Length field is used.

Evaluation random seeds

Random seeds used for evaluation episodes, specified as one of the following.

  • Empty field — The random seed is not initialized before an evaluation episode.

  • Nonnegative integer — The random seed is reinitialized to the specified value before every first evaluation episode that occurs after the number of consecutive training episodes specified in the Evaluation frequency field. This is the default behavior, with the seed initialized to 1.

  • Vector of nonnegative integers with the same number of elements as the number of evaluation episodes specified in the Number of evaluation episodes field — Before each episode of an evaluation sequence, the random seed is reinitialized to the corresponding element of the specified vector. This guarantees that the ith episode of each evaluation sequence always runs with the same random seed, which helps when comparing evaluation episodes occurring at different stages of training.

The current random seed used for training is stored before the first episode of an evaluation sequence and reset as the current seed after the evaluation sequence. This ensures that the training results with evaluation are the same as the results without evaluation.

Evaluation statistic type

Type of evaluation statistic for each group of consecutive evaluation episodes, specified as one of these strings:

  • "MeanEpisodeReward" — Mean value of the evaluation episodes rewards. This is the default behavior.

  • "MedianEpisodeReward" — Median value of the evaluation episodes rewards.

  • "MaxEpisodeReward" — Maximum value of the evaluation episodes rewards.

  • "MinEpisodeReward" — Minimum value of the evaluation episodes rewards.

This value is returned, in the training result object, as the element of the EvaluationStatistics vector corresponding to the training episode that precedes the group of consecutive evaluation episodes.

Use exploration policy

Option to use exploration policy during evaluation episodes. When this option is disabled (default) the agent uses its base greedy policy when selecting actions during an evaluation episode. When you enable this option, the agent uses its base exploration policy when selecting actions during an evaluation episode.

For more information on evaluation options, see rlEvaluator.

Specify Parallel Training Options

To enable the use of multiple processes for training, on the Train tab, click Parallel computing icon.. Training agents using parallel computing requires Parallel Computing Toolbox™ software. For more information, see Train Agents Using Parallel Computing and GPUs.

To specify options for parallel training, select Use Parallel > Parallel training options.

Parallel training options dialog box.

In the Parallel Training Options dialog box, you can specify the following training options.

OptionDescription
Enable parallel training

Enables using multiple processes to perform environment simulations during training. This option gets selected also when you click Parallel computing icon..

Parallel computing mode

Parallel computing mode, specified as one of the following values.

  • sync — Use parpool to run synchronous training on the available workers. The parallel pool client (the process that starts the training) updates the parameters of its actor and critic, based on the results from all the workers, and sends the updated parameters to all workers. In this case, workers must pause execution until all workers are finished, and as a result the training only advances as fast as the slowest worker allows.

  • async — Use parpool to run asynchronous training on the available workers. In this case, workers send their data back to the client as soon as they finish and receive updated parameters from the client. The workers then continue with their task.

Transfer workspace variables to workers

Select this option to send model and workspace variables to parallel workers. When you select this option, the parallel pool client (the process that starts the training) sends variables used in models and defined in the MATLAB® workspace to the workers.

Random seed for workers

Randomizer initialization for workers, specified as one of the following values.

  • –1 — Assign a unique random seed to each worker. The value of the seed is the worker ID.

  • –2 — Do not assign a random seed to the workers.

  • Vector — Manually specify the random seed for each worker. The number of elements in the vector must match the number of workers.

Files to attach to parallel poolAdditional files to attach to the parallel pool. Specify names of files in the current working directory, with one name on each line.
Worker setup functionFunction to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.
Worker cleanup functionFunction to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.

The following figure shows an example parallel training configuration for the following files and functions.

  • Data file attached to the parallel pool — workerData.mat

  • Worker setup function — mySetup.m

  • Worker cleanup function — myCleanup.m

Parallel training options dialog showing file and function information.

For more information on parallel training options, see the UseParallel and ParallelizationOptions properties in rlTrainingOptions. For more information on parallel training, see Train Agents Using Parallel Computing and GPUs.

Specify Additional Options

To specify additional training options, on the Train tab, click More Options.

In the More Training Options dialog box, you can specify the following options.

OptionDescription
Save agent criteria

Condition for saving agents during training, specified as one of the following values.

  • none — Do not save any agents during training.

  • AverageSteps — Save the agent when the running average number of steps per episode equals or exceeds the critical value specified by Save agent value.

  • AverageReward — Save the agent when the running average reward equals or exceeds the critical value.

  • EpisodeReward — Save the agent when the reward in the current episode equals or exceeds the critical value.

  • GlobalStepCount — Save the agent when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

  • EpisodeCount — Save the agent when the number of training episodes equals or exceeds the critical value.

Save agent valueCritical value of the save agent condition in Save agent criteria, specified as a scalar or "none".
Save directory

Folder for saved agents. If you specify a name and the folder does not exist, the app creates the folder in the current working directory.

To interactively select a folder, click Browse.

Show verbose outputSelect this option to display training progress at the command line.
Stop on ErrorSelect this option to stop training when an error occurs during an episode.

For more information training options, see rlTrainingOptions.

See Also

Apps

Functions

Objects

Related Examples

More About