getMaxQValue

Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations

Since R2020a

collapse all in page

Syntax

[maxQ,maxActionIndex] = getMaxQValue(qValueFcnObj,obs)

[maxQ,maxActionIndex,nextState] = getMaxQValue(___)

___ = getMaxQValue(___,UseForward=useForward)

Description

[maxQ,maxActionIndex] = getMaxQValue(qValueFcnObj,obs) evaluates the discrete-action-space Q-value function critic qValueFcnObj and returns the maximum estimated value over all possible actions maxQ, with the corresponding action index maxActionIndex, given environment observations obs.

example

[maxQ,maxActionIndex,nextState] = getMaxQValue(___) also returns the updated state of qValueFcnObj when it contains a recurrent neural network.

___ = getMaxQValue(___,UseForward=useForward) allows you to explicitly call a forward pass when computing gradients.

Examples

collapse all

Obtain Maximum Q-Value Function Estimates

Open Live Script

Create an observation and action specification objects (or alternatively use getObservationInfo and getActionInfo to extract the specification objects from an environment. For this example, define the observation space as a continuous three-dimensional space, and the action space as a finite set consisting of three possible values (named -1, 0, and 1).

obsInfo = rlNumericSpec([3 1]);
actInfo = rlFiniteSetSpec([-1 0 1]);

Create a default DQN agent and extract its critic.

agent = rlDQNAgent(obsInfo,actInfo);
critic = getCritic(agent);

Use getMaxQValue to return the maximum value, among the possible actions, given a random observation. Also return the index corresponding to the action that maximizes the value.

[v,i] = getMaxQValue(critic,{rand(3,1)})

v = single

-0.0430

i = 
3

Create a batch set of 64 random independent observations. The third dimension is the batch size, while the fourth is the sequence length for any recurrent neural network used by the critic (in this case not used).

batchobs = rand(3,1,64,1);

Obtain maximum values for all the observations.

bv = getMaxQValue(critic,{batchobs});
size(bv)

ans = 1×2

     1    64

Select the maximum value corresponding to the 44th observation.

bv(44)

ans = single

-0.0516

Input Arguments

collapse all

`qValueFcnObj` — Q-value function critic
`rlQValueFunction` object | `rlVectorQValueFunction` object

Q-value function critic, specified as an rlQValueFunction or rlVectorQValueFunction object.

`obs` — Environment observations
cell array

Environment observations, specified as a cell array with as many elements as there are observation input channels. Each element of obs contains an array of observations for a single observation input channel.

The dimensions of each element in obs are M_O-by-L_B-by-L_S, where:

M_O corresponds to the dimensions of the associated observation input channel.
L_B is the batch size. To specify a single observation, set L_B = 1. To specify a batch of observations, specify L_B > 1. If qValueFcnObj has multiple observation input channels, then L_B must be the same for all elements of obs.
L_S specifies the sequence length for a recurrent neural network. If qValueFcnObj does not use a recurrent neural network, then L_S = 1. If qValueFcnObj has multiple observation input channels, then L_S must be the same for all elements of obs.

L_B and L_S must be the same for both act and obs.

For more information on input and output formats for recurrent neural networks, see the Algorithms section of lstmLayer.

`useForward` — Option to use parallel training
`false` (default) | `true`

Option to use forward pass, specified as a logical value. When you specify UseForward=true the function calculates its outputs using forward instead of predict. This allows layers such as batch normalization and dropout to appropriately change their behavior for training.

Example: true

Output Arguments

collapse all

`maxQ` — Maximum Q-value estimate
array

Maximum Q-value estimate across all possible discrete actions, returned as a 1-by-L_B-by-L_S array, where:

L_B is the batch size.
L_S specifies the sequence length for a recurrent neural network. If qValueFcnObj does not use a recurrent neural network, then L_S = 1.

`maxActionIndex` — Action index
array

Action index corresponding to the maximum Q value, returned as a 1-by-L_B-by-L_S array, where:

L_B is the batch size.
L_S specifies the sequence length for a recurrent neural network. If qValueFcnObj does not use a recurrent neural network, then L_S = 1.

`nextState` — Updated state of the critic
cell array

Updated state of qValueFcnObj, returned as a cell array. If qValueFcnObj does not use a recurrent neural network, then nextState is an empty cell array.

You can set the state of the critic to state using dot notation. For example:

qValueFcnObj.State=state;

Tips

When the elements of the cell array in inData are dlarray objects, the elements of the cell array returned in outData are also dlarray objects. This allows getMaxQValue to be used with automatic differentiation.

Specifically, you can write a custom loss function that directly uses getMaxQValue and dlgradient within it, and then use dlfeval and dlaccelerate with your custom loss function. For an example, see Train Reinforcement Learning Policy Using Custom Training Loop and Custom Training Loop with Simulink Action Noise.

Version History

Introduced in R2020a

getMaxQValue

Syntax

Description

Examples

Obtain Maximum Q-Value Function Estimates

Input Arguments

`qValueFcnObj` — Q-value function critic
`rlQValueFunction` object | `rlVectorQValueFunction` object

`obs` — Environment observations
cell array

`useForward` — Option to use parallel training
`false` (default) | `true`

Output Arguments

`maxQ` — Maximum Q-value estimate
array

`maxActionIndex` — Action index
array

`nextState` — Updated state of the critic
cell array

Tips

Version History

See Also

Functions

Topics

getMaxQValue

Syntax

Description

Examples

Obtain Maximum Q-Value Function Estimates

Input Arguments

qValueFcnObj — Q-value function critic rlQValueFunction object | rlVectorQValueFunction object

obs — Environment observations cell array

useForward — Option to use parallel training false (default) | true

Output Arguments

maxQ — Maximum Q-value estimate array

maxActionIndex — Action index array

nextState — Updated state of the critic cell array

Tips

Version History

See Also

Functions

Topics

`qValueFcnObj` — Q-value function critic
`rlQValueFunction` object | `rlVectorQValueFunction` object

`obs` — Environment observations
cell array

`useForward` — Option to use parallel training
`false` (default) | `true`

`maxQ` — Maximum Q-value estimate
array

`maxActionIndex` — Action index
array

`nextState` — Updated state of the critic
cell array