append
Description
append(
appends the experiences in buffer
,experience
)experience
to the replay memory
buffer
.
append(
appends experiences for the specified data source to the replay memory buffer.buffer
,experience
,dataSourceID
)
Examples
Create Experience Buffer
Define observation specifications for the environment. For this example, assume that the environment has a single observation channel with three continuous signals in specified ranges.
obsInfo = rlNumericSpec([3 1],... LowerLimit=0,... UpperLimit=[1;5;10]);
Define action specifications for the environment. For this example, assume that the environment has a single action channel with two continuous signals in specified ranges.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 20,000.
buffer = rlReplayMemory(obsInfo,actInfo,20000);
Append a single experience to the buffer using a structure. Each experience contains the following elements: current observation, action, next observation, reward, and is-done.
For this example, create an experience with random observation, action, and reward values. Indicate that this experience is not a terminal condition by setting the IsDone
value to 0.
exp.Observation = {obsInfo.UpperLimit.*rand(3,1)}; exp.Action = {actInfo.UpperLimit.*rand(2,1)}; exp.Reward = 10*rand(1); exp.NextObservation = {obsInfo.UpperLimit.*rand(3,1)}; exp.IsDone = 0;
Before appending experience to the buffer, you can validate whether the experience is compatible with the buffer. The validateExperience
function generates an error if the experience is incompatible with the buffer.
validateExperience(buffer,exp)
Append the experience to the buffer.
append(buffer,exp);
You can also append a batch of experiences to the experience buffer using a structure array. For this example, append a sequence of 100 random experiences, with the final experience representing a terminal condition.
for i = 1:100 expBatch(i).Observation = {obsInfo.UpperLimit.*rand(3,1)}; expBatch(i).Action = {actInfo.UpperLimit.*rand(2,1)}; expBatch(i).Reward = 10*rand(1); expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(3,1)}; expBatch(i).IsDone = 0; end expBatch(100).IsDone = 1; validateExperience(buffer,expBatch) append(buffer,expBatch);
After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 50 experiences from the buffer.
miniBatch = sample(buffer,50);
You can sample a horizon of data from the buffer. For example, sample a horizon of 10 consecutive experiences with a discount factor of 0.95.
horizonSample = sample(buffer,1,... NStepHorizon=10,... DiscountFactor=0.95);
The returned sample includes the following information.
Observation
andAction
are the observation and action from the first experience in the horizon.NextObservation
andIsDone
are the next observation and termination signal from the final experience in the horizon.Reward
is the cumulative reward across the horizon using the specified discount factor.
You can also sample a sequence of consecutive experiences. In this case, the structure fields contain arrays with values for all sampled experiences.
sequenceSample = sample(buffer,1,...
SequenceLength=20);
Create Experience Buffer with Multiple Observation Channels
Define observation specifications for the environment. For this example, assume that the environment has two observation channels: one channel with two continuous observations and one channel with a three-valued discrete observation.
obsContinuous = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[1;5]); obsDiscrete = rlFiniteSetSpec([1 2 3]); obsInfo = [obsContinuous obsDiscrete];
Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.
actInfo = rlNumericSpec([2 1],... LowerLimit=0,... UpperLimit=[5;10]);
Create an experience buffer with a maximum length of 5,000.
buffer = rlReplayMemory(obsInfo,actInfo,5000);
Append a sequence of 50 random experiences to the buffer.
for i = 1:50 exp(i).Observation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; exp(i).Action = {actInfo.UpperLimit.*rand(2,1)}; exp(i).NextObservation = ... {obsInfo(1).UpperLimit.*rand(2,1) randi(3)}; exp(i).Reward = 10*rand(1); exp(i).IsDone = 0; end append(buffer,exp);
After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 10 experiences from the buffer.
miniBatch = sample(buffer,10);
Input Arguments
buffer
— Experience buffer
rlReplayMemory
object | rlPrioritizedReplayMemory
object | rlHindsightReplayMemory
object | rlHindsightPrioritizedReplayMemory
object
Experience buffer, specified as one of the following replay memory objects.
experience
— Experience to append to the buffer
structure | structure array | []
| {}
| ''
Experience to append to the buffer, specified as a structure or structure array with the
following fields (if experience
is empty, or if it
contains an empty structure, nothing is appended to the buffer).
Observation
— Observation
cell array
Observation, specified as a cell array with length equal to the number of observation specifications specified when creating the buffer. The dimensions of each element in Observation
must match the dimensions in the corresponding observation specification.
Action
— Agent action
cell array
Action taken by the agent, specified as a cell array with length equal to the number of action specifications specified when creating the buffer. The dimensions of each element in Action
must match the dimensions in the corresponding action specification.
Reward
— Reward value
scalar
Reward value obtained by taking the specified action from the starting observation, specified as a scalar.
NextObservation
— Next observation
cell array
Next observation reached by taking the specified action from the starting observation, specified as a cell array with the same format as Observation
.
IsDone
— Termination signal
0
| 1
| 2
Termination signal, specified as one of the following values.
0
— This experience is not the end of an episode.1
— The episode terminated because the environment generated a termination signal.2
— The episode terminated by reaching the maximum episode length.
dataSourceID
— Data source index
0
(default) | nonnegative integer | array of nonnegative integers
Data source index, specified as a nonnegative integer or array of nonnegative integers.
If experience
is a scalar structure, specify
dataSourceID
as a scalar integer.
If experience
is a structure array, specify
dataSourceID
as an array with length equal to the length of
experience
. You can specify different data source indices for
each element of experience
. If all elements in
experience
come from the same data source, you can specify
dataSourceID
as a scalar integer.
Version History
Introduced in R2022a
See Also
Functions
Objects
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令:
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)