rlPrioritizedReplayMemory
Description
An off-policy reinforcement learning agent stores experiences in a circular experience buffer.
During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:
S is the current observation of the environment.
A is the action taken by the agent.
R is the reward for taking action A.
S' is the next observation after taking action A.
D is the is-done signal after taking action A.
The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.
By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an rlReplayMemory
object
as their experience buffer. Agents uniformly sample data from this buffer. To perform
nonuniform prioritized sampling [1], which can improve sample
efficiency when training your agent, use an rlPrioritizedReplayMemory
object.
For more information on prioritized sampling, see Algorithms.
For goal-conditioned tasks, you can also replace your experience buffer with one of the following hindsight replay memory objects.
rlHindsightReplayMemory
— Uniform sampling of experiences and generation of hindsight experiences by replacing goals with goal measurementsrlHindsightPrioritizedReplayMemory
— Prioritized nonuniform sampling of experiences and generation of hindsight experiences
Creation
Syntax
Description
Input Arguments
Properties
Object Functions
append | Append experiences to replay memory buffer |
sample | Sample experiences from replay memory buffer |
resize | Resize replay memory experience buffer |
reset | Reset environment, agent, experience buffer, or policy object |
allExperiences | Return all experiences in replay memory buffer |
validateExperience | Validate experiences for replay memory |
getActionInfo | Obtain action data specifications from reinforcement learning environment, agent, or experience buffer |
getObservationInfo | Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer |
Examples
Limitations
Prioritized experience replay does not support agents that use recurrent neural networks.
Algorithms
References
[1] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. 'Prioritized experience replay'. arXiv:1511.05952 [Cs] 25 February 2016. https://arxiv.org/abs/1511.05952.
Version History
Introduced in R2022b