Output between generated Policy and trained Agent different
2 次查看(过去 30 天)
显示 更早的评论
Dear Mathworks Team,
I have trained a DDPG-Agent which recieves 2 observations.
By using the approach described in:
i generated a function evaluatePolicy.m which accepts an input of shape (2,1,1) and outputs a scalar. However the output differs from that of my Agent during training.
During the training, the following lines define certain action-properties in the definition of the Environment and Training (createSineAgent.m) process (not in the neural-Net definition of the Agent (createDDGPNetworks.m).
numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';
This prevents that agent outputs bigger than 1 and smaller than 0 are applied. The output during training is always between 1 and 0 and clipped at those values.
However, the output of the corresponding evaluatePolicy.m seems to range between -1 and 1 and not 0 and 1. Why is that?
Examples:
>> evaluatePolicy(reshape([-0.1515581,-0.1515581],2,1,1))
ans = 0.9986
>> evaluatePolicy(reshape([-0.1515581,-0.6],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,100],2,1,1))
ans = -1
>> evaluatePolicy(reshape([-0.1515581,-100],2,1,1))
ans = -1
I was expecting the output to be between 0 and 1 as defined in
numAct = 1;
actionInfo = rlNumericSpec([numAct 1],'LowerLimit',0 ,'UpperLimit', 1);
actionInfo.Name = 'sine_amplitude';
.
Does the approach described in:
not consider the ActionInfo ?
The output for
type evaluatePolicy.m
returns
>> type evaluatePolicy.m
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 22-Sep-2021 19:49:51
action1 = localEvaluate(observation1);
end
%% Local Functions
function action1 = localEvaluate(observation1)
persistent policy
if isempty(policy)
policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
action1 = predict(policy, observation1);
while it states in
that the output should be something more similar to
function action1 = evaluatePolicy(observation1)
%#codegen
% Reinforcement Learning Toolbox
% Generated on: 23-Feb-2021 18:52:32
actionSet = [-10 10];
% Select action from sampled probabilities
probabilities = localEvaluate(observation1);
% Normalize the probabilities
p = probabilities(:)'/sum(probabilities);
% Determine which action to take
edges = min([0 cumsum(p)],1);
edges(end) = 1;
[~,actionIndex] = histc(rand(1,1),edges); %#ok<HISTC>
action1 = actionSet(actionIndex);
end
%% Local Functions
function probabilities = localEvaluate(observation1)
persistent policy
if isempty(policy)
policy = coder.loadDeepLearningNetwork('agentData.mat','policy');
end
observation1 = observation1(:)';
probabilities = predict(policy, observation1);
end
.
In this output i can see a parameter
actionSet = [-10 10];
which considers the action boundaries as it seems.
In my example this is missing.
0 个评论
回答(0 个)
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!