Create Custom Simulink Environments
To create a custom Simulink® environment, first create a Simulink environment model that represents the world as seen from the agent. Such a system is often referred to as plant or open loop system, while the whole (integrated) system that includes both agent and environment is often referred to as the closed loop system.
Your environment model must have an input signal, the action, which influences (through
some discrete, continuous or mixed dynamics) its next internal state and its outputs, which
are the observation, the reward and the is-done signals. The is-done signal is a scalar that
indicates the termination of an episode, causing the simulation to stop when its value is
true
.
Note
A reinforcement learning environment is normally assumed to be strictly causal from the current action to the current observation. That is, it is assumed that the current observation does not depend on the current action (while the next state generally does). In other words, there must be no direct feedthrough between the current action and the current observation.
Note
The reward signal at time t must be the one corresponding to the transition between the observation output at time t-1 and the observation output at time t.
If your observation contains multiple channels, group the signals carried by the channels into a single observation bus. Similarly, for an hybrid environment, your action must be a two-element bus containing both the discrete (first) and the continuous (second) action channel. For more information about bus signals, see Simulink Bus Capabilities (Simulink).
For critical considerations on defining reward and observation signals in custom environments, see Define Reward and Observation Signals in Custom Environments.
Once you have created the Simulink model that represents the environment, you must add the RL Agent block to it. You can do so automatically or manually.
To automatically create a new closed-loop Simulink model that contains an RL Agent block and references your environment model from its Environment block, use
createIntegratedEnv
, specifying the names of both your existing environment model and of the new, to be created, closed-loop model that contains the agent.You can specify as input arguments the names of the action, observation, is-done, and reward ports in your environment model. If your action or observation space is finite, you can also specify its possible values (otherwise the signals are assumed to be continuous).
This function returns an environment object as well as the block path of the agent and the environment observation and action specifications. For more information on model referencing, see Model Reference Basics (Simulink).
To manually add the agent to your model, drag and drop the RL Agent block from the Reinforcement Learning Simulink library. Connect the action, observation, reward and is-done signals to the appropriate output and input ports of the block.
Unless you already have an agent object for this environment in the MATLAB® workspace, you must create specification objects for the action and observation signals using
rlNumericSpec
(for continuous signals) orrlFiniteSetSpec
(for discrete signals). For bus signals, create specifications usingbus2RLSpec
.Once you connect the blocks, create an environment object using
rlSimulinkEnv
, specifying the model filename, the block path to the RL Agent within the model, and the specification objects for the observation and the action channels, respectively. If your agent block already references an agent object in the MATLAB workspace, you do not need to supply the specification objects as input arguments.For an example, see Water Tank Reinforcement Learning Environment Model.
Both rlSimulinkEnv
and
createIntegratedEnv
return a custom Simulink environment as a SimulinkEnvWithAgent
object. This environment object acts as an interface so that when you call sim
or train
, these
functions in turn call the (compiled) Simulink model associated with the object to generate experiences for the agents. You can
use this object to train and simulate agents in the same way as with any other
environment.
Note
Before training or simulating an agent within a Simulink environment, to make sure that the RL Agent block runs at the
intended sample time, set the SampleTime
property of your agent object
appropriately.
You can also create a multiagent Simulink environment. To do so, create a Simulink model that has one action input and one set of outputs (observation, reward and
is-done) for every agent. Then manually add an agent block for each agent. Once you connect
the blocks, create an environment object using rlSimulinkEnv
. Unless
each agent block already references an agent object in the MATLAB workspace, you must supply to rlSimulinkEnv
two
cell arrays containing the observation action specification objects, respectively, as input
arguments. For an example, see Train Multiple Agents to Perform Collaborative Task.
Your environment can also include third-party functionality. For more information, see Integrate Components from External Tools (Simulink).
Algebraic Loops Between Environment and Agent
To avoid (potentially unsolvable) algebraic loops, you must avoid any direct feedthrough (that is any direct dependency in the same time step) from the action to the observation output signal. This is because in the Simulink implementation of the agent block, the action at a given time step depends on the observation at the same time step. In other words, the agent block has a direct feedthrough from its observation input to its action output (similarly to an output feedback controller).
Avoiding a direct feedthrough from the action to the observation output signal is also in line with the fact that the standard formulation of a reinforcement learning environment as a Markov Decision Process is strictly causal from the current action to the current observation, since the current state does not depend on the current action (while the next state generally does).
However, note that for models created using createIntegratedEnv
the environment block is a referenced subsystem. Referenced subsystems are normally treated
as a direct feedthrough block (including the path from action to observation) unless the
Minimize algebraic loop occurrences parameter
in the referenced subsystem is enabled. When the referenced model has
no direct feedthrough from an input port that participates in an artificial algebraic loop
to any of its outputs ports, enabling this parameter can remove
artificial algebraic loops involving the model.
In general, adding a Delay (Simulink) or Memory (Simulink) block to the action signal between the agent block and environment block removes the algebraic loop. When you add an action delay, make sure that your reset function, which is called at the beginning of each training or simulation episode, initializes the delay to a feasible value.
Alternatively you can add delay blocks to all the environment output signals after the environment block. If you do so, make sure that your reset function initializes the delay to a feasible value which is also consistent with the initial state of the environment.
Note
In general, adding delays to solve algebraic loops should be done with extreme care, as it involves a modification of the loop dynamics.
If you have separate state and output functions (instead of a single step function), you can call them using separate MATLAB Function (Simulink) blocks, using a delay to represent the environment state. If you do so, your reset function only needs to initialize the state.
For more information on algebraic loops and how to remove some of them, see Algebraic Loop Concepts (Simulink) and Remove Algebraic Loops (Simulink). For a related example about using delays in a reinforcement learning loop implemented in Simulink, see Create and Simulate Same Environment in Both MATLAB and Simulink.
See Also
Functions
Objects
Related Examples
- Water Tank Reinforcement Learning Environment Model
- Create and Simulate Same Environment in Both MATLAB and Simulink