Mixed Type Observation Variables in RL

Question

Mahmood reza Azizi 2024-9-9

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2151409-mixed-type-observation-variables-in-rl

回答： Shantanu Dixit 2024-9-10

Hi.

I want to design a DQN agent to train in an environment that its observation variables consists of 5 continous double variable, a discreat variable with values [0 1] and two discreat variables with values [-1 0 1]. I define the observation info as:

ObsInfo = [
    rlNumericSpec([1 5], 'Name', 'X15'), ... % 5 double observation variables
    rlFiniteSetSpec([0 1], 'Name', 'X6'), ... % 1 discrete observation variable with values [0 1]
    rlFiniteSetSpec([-1 0 1], 'Name', 'X7'), ... % 2 discrete observation variables with values [-1 0 1]
    rlFiniteSetSpec([-1 0 1], 'Name', 'X8')
];
ActionInfo = rlFiniteSetSpec([-2, -1, 0, 1, 2]);

Therefore, the reset and step functions returns Observation in the form:

Obs = {[X1; X2; X3; X4; X5], X6, X7, X8}

Then I define a deep neural network as follows:

layers = [
    featureInputLayer(8, 'Normalization', 'none', 'Name', 'state') % 8 observation variables
    fullyConnectedLayer(100, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(100, 'Name', 'fc2')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(5, 'Name', 'fc3') % Number of actions
];
dnn = dlnetwork(layers);
critic = rlVectorQValueFunction(dnn,obsInfo,ActionInfo);

However, this code leads to following error:

The number of network input layers must be equal to the number of observation channels in the environment specification object.

Could you please help me to fix this issue? Is the definition of ObsInfo is correct for this type of problem? And also is the architecture of the network is ok?

Thank you.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Shantanu Dixit 2024-9-10

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2151409-mixed-type-observation-variables-in-rl#answer_1513889

在 MATLAB Online 中打开

Hi Mahmood,

The issue encountered is due to the mismatch between the observation space and the network input layer. For incorporating both continuous and discrete observations you can try using a single continuous observation space. The discrete observations will take values in a finite set as dictated by the environment. This approach would require changing the environment to output continuous values.

ObsInfo = rlNumericSpec([1, 8], 'Name', 'Observations');

Alternatively, if the observations are to be provided as separate channels as given in the above code, the network needs to be modified to handle multiple input channels. Following steps describe it briefly

Separate Input layers for each observation channel followed by fully connected layers for feature extraction
Concatenation/Adding outputs from the separate channels
Passing the concatenated to the base network for further processing

Below is a reference code for the above (using the same base network as earlier) :

%% 1. Separate input layers for each channel
continuousInput = featureInputLayer(5, 'Normalization', 'none', 'Name', 'continuousInput');
binaryInput = featureInputLayer(1, 'Normalization', 'none', 'Name', 'binaryInput');
ternaryInput1 = featureInputLayer(1, 'Normalization', 'none', 'Name', 'ternaryInput1');
ternaryInput2 = featureInputLayer(1, 'Normalization', 'none', 'Name', 'ternaryInput2');
continuousPath = [
    continuousInput
    fullyConnectedLayer(10, 'Name', 'fc_continuous')
    reluLayer('Name', 'relu_continuous')
];
binaryPath = [
    binaryInput
    fullyConnectedLayer(5, 'Name', 'fc_binary')
    reluLayer('Name', 'relu_binary')
];
ternaryPath1 = [
    ternaryInput1
    fullyConnectedLayer(5, 'Name', 'fc_ternary1')
    reluLayer('Name', 'relu_ternary1')
];
ternaryPath2 = [
    ternaryInput2
    fullyConnectedLayer(5, 'Name', 'fc_ternary2')
    reluLayer('Name', 'relu_ternary2')
];

%% 2. Concatenating outputs from all the channels
concatLayer = concatenationLayer(1, 4, 'Name', 'concat');
% Further processing after concatenation
commonPath = [
    fullyConnectedLayer(100, 'Name', 'fc1')
    reluLayer('Name', 'relu1')
    fullyConnectedLayer(100, 'Name', 'fc2')
    reluLayer('Name', 'relu2')
    fullyConnectedLayer(5, 'Name', 'fc3')
];
% Assemble the network
layers = layerGraph();
layers = addLayers(layers, continuousPath);
layers = addLayers(layers, binaryPath);
layers = addLayers(layers, ternaryPath1);
layers = addLayers(layers, ternaryPath2);
layers = addLayers(layers, concatLayer);
layers = addLayers(layers, commonPath);
% Connect the layers
layers = connectLayers(layers, 'relu_continuous', 'concat/in1');
layers = connectLayers(layers, 'relu_binary', 'concat/in2');
layers = connectLayers(layers, 'relu_ternary1', 'concat/in3');
layers = connectLayers(layers, 'relu_ternary2', 'concat/in4');
layers = connectLayers(layers, 'concat', 'fc1');

%% 3. Passing to base network and further processing
dnn = dlnetwork(layers);
ObsInfoContinuous = rlNumericSpec([1 5], 'Name', 'ContinuousObs');
ObsInfoBinary = rlFiniteSetSpec([0 1], 'Name', 'BinaryObs');
ObsInfoTernary1 = rlFiniteSetSpec([-1 0 1], 'Name', 'TernaryObs1');
ObsInfoTernary2 = rlFiniteSetSpec([-1 0 1], 'Name', 'TernaryObs2');
ActionInfo = rlFiniteSetSpec([-2, -1, 0, 1, 2]);
critic = rlVectorQValueFunction(dnn, ...
    [ObsInfoContinuous, ObsInfoBinary, ObsInfoTernary1, ObsInfoTernary2], ...
    ActionInfo, ...
    'ObservationInputNames', {'continuousInput', 'binaryInput', 'ternaryInput1', 'ternaryInput2'});

Refer to the below MathWorks documentation for more information on creating observation info using different channels:

https://www.mathworks.com/help/reinforcement-learning/ref/rl.function.rlvectorqvaluefunction.html#mw_a7aec303-f92a-4c06-a4d1-5b3027ef0c8d

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Mixed Type Observation Variables in RL

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Mixed Type Observation Variables in RL

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论