Main Content

rlPrioritizedReplayMemory

Replay memory experience buffer with prioritized sampling

Since R2022b

    Description

    An off-policy reinforcement learning agent stores experiences in a circular experience buffer.

    During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:

    • S is the current observation of the environment.

    • A is the action taken by the agent.

    • R is the reward for taking action A.

    • S' is the next observation after taking action A.

    • D is the is-done signal after taking action A.

    The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.

    By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an rlReplayMemory object as their experience buffer. Agents uniformly sample data from this buffer. To perform nonuniform prioritized sampling [1], which can improve sample efficiency when training your agent, use an rlPrioritizedReplayMemory object. For more information on prioritized sampling, see Algorithms.

    For goal-conditioned tasks, you can also replace your experience buffer with one of the following hindsight replay memory objects.

    Creation

    Description

    buffer = rlPrioritizedReplayMemory(obsInfo,actInfo) creates a prioritized replay memory experience buffer that is compatible with the observation and action specifications in obsInfo and actInfo, respectively.

    example

    buffer = rlPrioritizedReplayMemory(obsInfo,actInfo,maxLength) sets the maximum length of the buffer by setting the MaxLength property.

    Input Arguments

    expand all

    Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data types, and names of the observation signals.

    You can extract the observation specifications from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

    Action specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data types, and names of the action signals.

    You can extract the action specifications from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

    Properties

    expand all

    This property is read-only.

    Maximum buffer length, specified as a nonnegative integer.

    To change the maximum buffer length, use the resize function.

    This property is read-only.

    Number of experiences in buffer, specified as a nonnegative integer.

    Priority exponent to control the impact of prioritization during probability computation, specified as a nonnegative scalar less than or equal to 1.

    If the priority exponent is zero, the agent uses uniform sampling.

    Initial value of the importance sampling exponent, specified as a nonnegative scalar less than or equal to 1.

    Number of annealing steps for updating the importance sampling exponent, specified as a positive integer.

    This property is read-only.

    Current value of the importance sampling exponent, specified as a nonnegative scalar less than or equal to 1.

    During training, ImportanceSamplingExponent is linearly increased from InitialImportanceSamplingExponent to 1 over NumAnnealingSteps steps.

    Object Functions

    appendAppend experiences to replay memory buffer
    sampleSample experiences from replay memory buffer
    resizeResize replay memory experience buffer
    resetReset environment, agent, experience buffer, or policy object
    allExperiencesReturn all experiences in replay memory buffer
    validateExperienceValidate experiences for replay memory
    getActionInfoObtain action data specifications from reinforcement learning environment, agent, or experience buffer
    getObservationInfoObtain observation data specifications from reinforcement learning environment, agent, or experience buffer

    Examples

    collapse all

    Create an environment for training the agent. For this example, load a predefined environment.

    env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

    Extract the observation and action specifications from the agent.

    obsInfo = getObservationInfo(env);
    actInfo = getActionInfo(env);

    Create a DQN agent from the environment specifications.

    agent = rlDQNAgent(obsInfo,actInfo);

    By default, the agent uses a replay memory experience buffer with uniform sampling.

    Replace the default experience buffer with a prioritized replay memory buffer.

    agent.ExperienceBuffer = rlPrioritizedReplayMemory(obsInfo,actInfo);

    Configure the prioritized replay memory options. For example, set the initial importance sampling exponent to 0.5 and the number of annealing steps for updating the exponent during training to 1e4.

    agent.ExperienceBuffer.NumAnnealingSteps = 1e4;
    agent.ExperienceBuffer.PriorityExponent = 0.5;
    agent.ExperienceBuffer.InitialImportanceSamplingExponent = 0.5;

    Limitations

    • Prioritized experience replay does not support agents that use recurrent neural networks.

    Algorithms

    expand all

    References

    [1] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. 'Prioritized experience replay'. arXiv:1511.05952 [Cs] 25 February 2016. https://arxiv.org/abs/1511.05952.

    Version History

    Introduced in R2022b