# rlQValueFunction

Q-Value function approximator object for reinforcement learning agents

*Since R2022a*

## Description

This object implements a Q-value function approximator that you can use as a
critic for a reinforcement learning agent. A Q-value function (also known as action-value
function) is a mapping from an environment observation-action pair to the value of a policy.
Specifically, its output is a scalar that represents the expected discounted cumulative
long-term reward when an agent starts from the state corresponding to the given observation,
executes the given action, and keeps on taking actions according to the given policy
afterwards. A Q-value function critic therefore needs both the environment state and an action
as inputs. After you create an `rlQValueFunction`

critic, use it to create an
agent such as `rlQAgent`

, `rlDQNAgent`

, `rlSARSAAgent`

, `rlDDPGAgent`

, or `rlTD3Agent`

. For more
information on creating representations, see Create Policies and Value Functions.

## Creation

### Syntax

### Description

creates the Q-value function object `critic`

= rlQValueFunction(`net`

,`observationInfo`

,`actionInfo`

)`critic`

. Here,
`net`

is the deep neural network used as an approximation model,
and it must have both observation and action as inputs and a single scalar output. The
network input layers are automatically associated with the environment observation and
action channels according to the dimension specifications in
`observationInfo`

and `actionInfo`

. This
function sets the `ObservationInfo`

and
`ActionInfo`

properties of `critic`

to the
`observationInfo`

and `actionInfo`

input
arguments, respectively.

creates the Q-value function object `critic`

= rlQValueFunction(`tab`

,`observationInfo`

,`actionInfo`

)`critic`

with *discrete
action and observation spaces* from the Q-value table
`tab`

. `tab`

is a `rlTable`

object
containing a table with as many rows as the number of possible observations and as many
columns as the number of possible actions. The function sets the ObservationInfo
and ActionInfo
properties of `critic`

respectively to the
`observationInfo`

and `actionInfo`

input
arguments, which in this case must be scalar `rlFiniteSetSpec`

objects.

creates a Q-value function object `critic`

= rlQValueFunction({`basisFcn`

,`W0`

},`observationInfo`

,`actionInfo`

)`critic`

using a custom basis
function as underlying approximator. The first input argument is a two-element cell
array whose first element is the handle `basisFcn`

to a custom basis
function and whose second element is the initial weight vector `W0`

.
Here the basis function must have both observation and action as inputs and
`W0`

must be a column vector. The function sets the
`ObservationInfo`

and `ActionInfo`

properties of
`critic`

to the `observationInfo`

and
`actionInfo`

input arguments, respectively.

specifies one or more name-value arguments. You can specify the input and output layer
names (to mandate their association with the environment observation and action
channels) for deep neural network approximators. For all types of approximators, you can
specify the computation device, for example `critic`

= rlQValueFunction(___,`Name=Value`

)`UseDevice="gpu"`

.

### Input Arguments

## Properties

## Object Functions

`rlDDPGAgent` | Deep deterministic policy gradient (DDPG) reinforcement learning agent |

`rlTD3Agent` | Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent |

`rlDQNAgent` | Deep Q-network (DQN) reinforcement learning agent |

`rlQAgent` | Q-learning reinforcement learning agent |

`rlSARSAAgent` | SARSA reinforcement learning agent |

`rlSACAgent` | Soft actor-critic (SAC) reinforcement learning agent |

`getValue` | Obtain estimated value from a critic given environment observations and actions |

`getMaxQValue` | Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations |

`evaluate` | Evaluate function approximator object given observation (or observation-action) input data |

`gradient` | Evaluate gradient of function approximator object given observation and action input data |

`accelerate` | Option to accelerate computation of gradient for approximator object based on neural network |

`getLearnableParameters` | Obtain learnable parameter values from agent, function approximator, or policy object |

`setLearnableParameters` | Set learnable parameter values of agent, function approximator, or policy object |

`setModel` | Set approximation model in function approximator object |

`getModel` | Get approximation model from function approximator object |

## Examples

## Version History

**Introduced in R2022a**