Main Content

getAction

Obtain action from agent, actor, or policy object given environment observations

Since R2020a

Description

Agent

action = getAction(agent,obs) returns the action generated from the policy of a reinforcement learning agent, given environment observations. If agent contains internal states, they are updated.

example

[action,agent] = getAction(agent,obs) also returns the updated agent as an output argument.

Actor

action = getAction(actor,obs) returns the action generated from the policy represented by the actor actor, given environment observations obs.

example

[action,nextState] = getAction(actor,obs) also returns the updated state of the actor when it uses a recurrent neural network.

Policy

action = getAction(policy,obs) returns the action generated from the policy object policy, given environment observations.

example

[action,updatedPolicy] = getAction(policy,obs) also returns the updated policy as an output argument (any internal state of the policy, if used, is updated).

Use Forward

___ = getAction(___,UseForward=useForward) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Create an environment with a discrete action space, and obtain its observation and action specifications. For this example, load the environment used in the example Create DQN Agent Using Deep Network Designer and Train Using Image Observations.

% load predefined environment
env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

Obtain the observation and action specifications for this environment.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a TRPO agent from the environment observation and action specifications.

agent = rlTRPOAgent(obsInfo,actInfo);

Use getAction to return the action from a random observation.

getAction(agent, ...
    {rand(obsInfo(1).Dimension), ...
     rand(obsInfo(2).Dimension)})
ans = 1x1 cell array
    {[-2]}

You can also obtain actions for a batch of observations. For example, obtain actions for a batch of 10 observations.

actBatch = getAction(agent, ...
    {rand([obsInfo(1).Dimension 10]), ...
     rand([obsInfo(2).Dimension 10])});
size(actBatch{1})
ans = 1×3

     1     1    10

actBatch{1}(1,1,7)
ans = 
-2

actBatch contains one action for each observation in the batch.

Create observation and action information. You can also obtain these specifications from an environment.

obsinfo = rlNumericSpec([4 1]);
actinfo = rlNumericSpec([2 1]);

Create a deep neural network for the actor.

net = [featureInputLayer(obsinfo.Dimension(1), ...
           "Normalization","none","Name","state")
       fullyConnectedLayer(10,"Name","fc1")
       reluLayer("Name","relu1")
       fullyConnectedLayer(20,"Name","CriticStateFC2")
       fullyConnectedLayer(actinfo.Dimension(1),"Name","fc2")
       tanhLayer("Name","tanh1")];
net = dlnetwork(net);

Create a deterministic actor representation for the network.

actor = rlContinuousDeterministicActor(net, ...
    obsinfo,actinfo,...
    ObservationInputNames={"state"});

Obtain an action from this actor for a random batch of 20 observations.

act = getAction(actor,{rand(4,1,10)})
act = 1x1 cell array
    {2x1x10 single}

act is a single cell array that contains the two computed actions for all 10 observations in the batch.

act{1}(:,1,7)
ans = 2x1 single column vector

    0.2643
   -0.2934

Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

obsInfo = rlNumericSpec([4 1]);
actInfo = rlNumericSpec([2 1]);

Alternatively, you can use getObservationInfo and getActionInfo to extract the specification objects from an environment.

Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

To approximate the policy function within the actor, use a recurrent deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. To create a recurrent network, use a sequenceInputLayer as the input layer (with size equal to the number of dimensions of the observation channel) and include at least one lstmLayer.

layers = [ 
    sequenceInputLayer(obsInfo.Dimension(1))
    lstmLayer(2)
    reluLayer
    fullyConnectedLayer(actInfo.Dimension(1)) 
    ];

Convert the network to a dlnetwork object and display the number of weights.

model = dlnetwork(layers);
summary(model)
   Initialized: true

   Number of learnables: 62

   Inputs:
      1   'sequenceinput'   Sequence input with 4 dimensions

Create the actor using model, and the observation and action specifications.

actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)
actor = 
  rlContinuousDeterministicActor with properties:

    ObservationInfo: [1x1 rl.util.rlNumericSpec]
         ActionInfo: [1x1 rl.util.rlNumericSpec]
      Normalization: "none"
          UseDevice: "cpu"
         Learnables: {5x1 cell}
              State: {2x1 cell}

Check the actor with a random observation input.

act = getAction(actor,{rand(obsInfo.Dimension)});
act{1}
ans = 2x1 single column vector

    0.0568
    0.0691

Create an additive noise policy object from actor.

policy = rlAdditiveNoisePolicy(actor)
policy = 
  rlAdditiveNoisePolicy with properties:

               Actor: [1x1 rl.function.rlContinuousDeterministicActor]
           NoiseType: "gaussian"
        NoiseOptions: [1x1 rl.option.GaussianActionNoise]
    EnableNoiseDecay: 1
       Normalization: "none"
      UseNoisyAction: 1
     ObservationInfo: [1x1 rl.util.rlNumericSpec]
          ActionInfo: [1x1 rl.util.rlNumericSpec]
          SampleTime: -1

Use dot notation to set the standard deviation decay rate.

policy.NoiseOptions.StandardDeviationDecayRate = 0.9;

Use getAction to generate an action from the policy, given a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});
act{1}
ans = 2×1

    0.5922
   -0.3745

Display the state of the recurrent neural network in the policy object.

xNN = getRNNState(policy);
xNN{1}
ans = 2x1 single column vector

     0
     0

Use getAction to also return the updated policy as a second argument.

[act, updatedPolicy] = getAction(policy,{rand(obsInfo.Dimension)});

Display the state of the recurrent neural network in the updated policy object.

xpNN = getRNNState(updatedPolicy);
xpNN{1}
ans = 2x1 single column vector

    0.3327
   -0.2479

As expected, the state is updated.

Input Arguments

collapse all

Reinforcement learning agent, specified as one of the following objects:

Note

agent is an handle object, so it is updated whether it is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

Reinforcement learning policy, specified as one of the following objects:

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are MO-by-LB-by-LS, where:

  • MO corresponds to the dimensions of the associated observation input channel.

  • LB is the batch size. To specify a single observation, set LB = 1. To specify a batch of observations, specify LB > 1. If valueRep or qValueRep has multiple observation input channels, then LB must be the same for all elements of obs.

  • LS specifies the sequence length for a recurrent neural network. If valueRep or qValueRep does not use a recurrent neural network, then LS = 1. If valueRep or qValueRep has multiple observation input channels, then LS must be the same for all elements of obs.

LB and LS must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

Option to use forward pass, specified as a logical value. When you specify UseForward=true the function calculates its outputs using forward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

Action, returned as cell array containing either one (for discrete or continuous action spaces) or two (for hybrid action spaces) elements. Each element of the array in turn contains the action corresponding to obs, which is an array with dimensions MA-by-LB-by-LS, where:

  • MA corresponds to the dimensions of the associated action specification.

  • LB is the batch size.

  • LS is the sequence length for recurrent neural networks. If the agent, actor, or policy calculating action do not use recurrent neural networks, then LS = 1.

For hybrid action spaces, the first element of the cell array contains the discrete part of the action, while the second element contains the continuous part of the action.

Note

The following continuous action-space actor, policy and agent objects do not enforce the constraints set by the action specification:

In these cases, you must enforce action space constraints within the environment.

Next state of the actor, returned as a cell array. If actor does not use a recurrent neural network, then state is an empty cell array.

You can set the state of the actor to state using dot notation. For example:

actor.State=state;

Updated agent, returned as the same agent object as the agent in the input argument. Note that agent is an handle object. Therefore, its internal states (if any) are updated whether agent is returned as an output argument or not. For more information about handle objects, see Handle Object Behavior.

Updated policy object. It is identical to the policy object supplied as first input argument, except that its internal states (if any) are updated.

Tips

Version History

Introduced in R2020a