How to format sequences to store in experience buffer for DRQN?

Question

Imola Fodor 2024-2-27

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2087601-how-to-format-sequences-to-store-in-experience-buffer-for-drqn

评论： Imola Fodor 2024-6-4

For DRQN (Deep Recurrent Q Learning) in POMDP it is needed to store entire sequences instead of individual transitions in the replay buffer. For the object agent.ExperienceBuffer, how to construct the data? For example, for Observation element i have tried to have a 1x1 cell with inside the numchannel x sequencelength, and also to have a cell array directly numchannel x sequencelength. the idea was to then sample minibatch of sequences instead of minibatch of transitions.

For any trial I get an error

Error using rl.replay.rlReplayMemory/validateExperience
Observation dimensions must match the dimensions specified in the corresponding specifications.

More specifically, when debugging i see that in the first case (1x1 cell) the code crashes at :

for obsCh = 1:numObsChannels
    if ~all(size(NewObs{obsCh}) == obj.InternalReplayMemory_.ObservationDimension{obsCh})
        error(message('rl:general:errIncorrectObservationDim'));
    end
    

And in the second case at:

    if numObsChannels ~=  numel(NewObs)
        error(message('rl:general:errIncorrectObservationDim'));
    end   

In MATLAB it is possible to have dqn with recurrent layers, so there is certainly a way to store these sequences somehow.

Thank you,

Imola

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shubham 2024-5-29

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2087601-how-to-format-sequences-to-store-in-experience-buffer-for-drqn#answer_1464681

Hi Imola,

To handle sequences in the replay buffer for Deep Recurrent Q-Networks (DRQN) within a Partially Observable Markov Decision Process (POMDP) setting in MATLAB, you need to structure your observations and experiences in a way that aligns with the expected format of the rl.ExperienceBuffer or any custom replay buffer you're implementing. The error you're encountering is due to a mismatch in the dimensions of the observations you're trying to store versus what the replay memory expects based on the observation space specifications.

Here's how you can approach this:

1. Observation and Action Space Specification

First, ensure that your observation and action spaces are correctly specified to accommodate sequences. For a DRQN, the observation space must account for the sequence length as part of its dimensionality if you're not using a 1x1 cell to encapsulate the entire sequence.

2. Storing Sequences

When storing sequences, the key is to maintain consistency in how observations are represented. If your environment's observation for a single timestep is a vector of size [numChannels, 1], then for a sequence of length sequenceLength, you'd typically have an observation of size [numChannels, sequenceLength].

However, MATLAB's RL framework expects each observation to be encapsulated in a cell array where each cell corresponds to one "channel" or dimension of the observation space. For sequence data, you need to ensure that the entire sequence for a single channel is contained within a single cell, and the dimensions match what the environment and agent expect.

3. Correct Approach for Sequences

Given the errors you're encountering, let's clarify the correct approach:

For a 1x1 Cell Approach: If you're trying to encapsulate the entire sequence in a 1x1 cell, ensure that the cell contains a matrix where each column represents a timestep, and the rows represent different features or channels of the observation. This approach might require custom handling in your experience replay mechanism to correctly sample and utilize these sequences.
For a Cell Array Directly Matching numChannel x sequenceLength: This seems to be a misunderstanding. If you're using a cell array where each cell is supposed to represent a channel over the sequence, ensure that each cell actually contains a vector representing the sequence for that channel. The correct dimensionality for a cell array storing sequences would be [1, numChannels] where each cell contains a vector of length sequenceLength, not a matrix of [numChannels, sequenceLength].

4. Sampling Mini-batches

When sampling mini-batches of sequences, you must ensure that each sampled experience contains the full sequence as required for the DRQN's input. This might involve custom modifications to the sampling logic to ensure that sequences are kept intact and not broken up.

5. Debugging Tips

Check Dimensionality at Every Step: Print out the dimensions of your observations at various points (creation, before storing, and during retrieval) to ensure they match expectations.
Align with Agent Specifications: Double-check the agent's expected input dimensions, especially if you're using recurrent layers, to ensure compatibility.
Custom Replay Buffer: If the built-in rl.ExperienceBuffer doesn't meet your needs for sequence handling, consider implementing a custom replay buffer that explicitly supports sequences in the way you require.

Remember, the key to successfully implementing DRQN in MATLAB is ensuring that your observation sequences are correctly formatted and that your replay buffer is capable of handling, storing, and sampling these sequences in a way that aligns with the expected input structure of your recurrent neural network.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Imola Fodor 2024-6-4

hello Shubham, this answer is very long but unfortunately i dont see any concrete solutions.. Can you point me to some documentation where I can read about "...each observation to be encapsulated in a cell array where each cell corresponds to one "channel" or dimension of the observation space. For sequence data, ..."? Another thing, i see staterments such as "This might involve custom modifications to the sampling logic " or "This approach might require custom handling in your experience replay mechanism "...

请先登录，再进行评论。