rlQValueFunction
Q-Value function approximator with a continuous or discrete action space reinforcement learning agents
Since R2022a
Description
This object implements a Q-value function approximator that you can use as a
critic for a reinforcement learning agent. A Q-value function (also known as action-value
function) is a mapping from an environment observation-action pair to the value of a policy.
Specifically, its output is a scalar that represents the expected discounted cumulative
long-term reward when an agent starts from the state corresponding to the given observation,
executes the given action, and keeps on taking actions according to the given policy
afterwards. A Q-value function critic therefore needs both the environment state and an action
as inputs. After you create an rlQValueFunction critic, use it to create an
agent such as rlQAgent, rlDQNAgent, rlSARSAAgent, rlDDPGAgent, or rlTD3Agent. For more
information on creating actors and critics, see Create Policies and Value Functions.
Creation
Syntax
Description
creates the Q-value function object critic = rlQValueFunction(net,observationInfo,actionInfo)critic. Here,
net is the deep neural network used as an approximation model,
and it must have both observation and action as input layers and a single scalar output
layer. The network input layers are automatically associated with the environment
observation and action channels according to the dimension specifications in
observationInfo and actionInfo. This
function sets the ObservationInfo and
ActionInfo properties of critic to the
observationInfo and actionInfo input
arguments, respectively.
creates the Q-value function object critic = rlQValueFunction(tab,observationInfo,actionInfo)critic with discrete
action and observation spaces from the Q-value table
tab. tab is a rlTable object
containing a table with as many rows as the number of possible observations and as many
columns as the number of possible actions. The function sets the ObservationInfo and ActionInfo
properties of critic respectively to the
observationInfo and actionInfo input
arguments, which in this case must be scalar rlFiniteSetSpec
objects.
creates a Q-value function object critic = rlQValueFunction({basisFcn,W0},observationInfo,actionInfo)critic using a custom basis
function as underlying approximator. The first input argument is a two-element cell
array whose first element is the handle basisFcn to a custom basis
function and whose second element is the initial weight vector W0.
Here the basis function must have both observation and action as inputs and
W0 must be a column vector. The function sets the
ObservationInfo and ActionInfo properties of
critic to the observationInfo and
actionInfo input arguments, respectively.
specifies names of the action or observation input layers (for network-based
approximators) or sets the critic = rlQValueFunction(___,Name=Value)UseDevice property of critic using one
or more name-value arguments. Specifying the input layer names allows you explicitly
associate the layers of your network approximator with specific environment channels.
For all types of approximators, you can specify the device where computations for
critic are executed, for example
UseDevice="gpu".
Input Arguments
Name-Value Arguments
Properties
Object Functions
rlDDPGAgent | Deep deterministic policy gradient (DDPG) reinforcement learning agent |
rlTD3Agent | Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent |
rlDQNAgent | Deep Q-network (DQN) reinforcement learning agent |
rlQAgent | Q-learning reinforcement learning agent |
rlSARSAAgent | SARSA reinforcement learning agent |
rlSACAgent | Soft actor-critic (SAC) reinforcement learning agent |
getValue | Obtain estimated value from a critic given environment observations and actions |
getMaxQValue | Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations |
evaluate | Evaluate function approximator object given observation (or observation-action) input data |
getLearnableParameters | Obtain learnable parameter values from agent, function approximator, or policy object |
setLearnableParameters | Set learnable parameter values of agent, function approximator, or policy object |
setModel | Set approximation model in function approximator object |
getModel | Get approximation model from function approximator object |
Examples
Version History
Introduced in R2022a

