Simulate a reinforcement learning environment with an agent configured for that environment. For this example, load an environment and agent that are already configured. The environment is a discrete cart-pole environment created with rlPredefinedEnv
. The agent is a policy gradient (rlPGAgent
) agent. For more information about the environment and agent used in this example, see Train PG Agent to Balance Cart-Pole System.
env =
CartPoleDiscreteAction with properties:
Gravity: 9.8000
MassCart: 1
MassPole: 0.1000
Length: 0.5000
MaxForce: 10
Ts: 0.0200
ThetaThresholdRadians: 0.2094
XThreshold: 2.4000
RewardForNotFalling: 1
PenaltyForFalling: -5
State: [4x1 double]
agent =
rlPGAgent with properties:
AgentOptions: [1x1 rl.option.rlPGAgentOptions]
Typically, you train the agent using train
and simulate the environment to test the performance of the trained agent. For this example, simulate the environment using the agent you loaded. Configure simulation options, specifying that the simulation run for 100 steps.
For the predefined cart-pole environment used in this example. you can use plot
to generate a visualization of the cart-pole system. When you simulate the environment, this plot updates automatically so that you can watch the system evolve during the simulation.
Simulate the environment.
experience = struct with fields:
Observation: [1x1 struct]
Action: [1x1 struct]
Reward: [1x1 timeseries]
IsDone: [1x1 timeseries]
SimulationInfo: [1x1 struct]
The output structure experience
records the observations collected from the environment, the action and reward, and other data collected during the simulation. Each field contains a timeseries
object or a structure of timeseries
data objects. For instance, experience.Action
is a timeseries
containing the action imposed on the cart-pole system by the agent at each step of the simulation.
ans = struct with fields:
CartPoleAction: [1x1 timeseries]