"Unable to evaluate the loss function. Check the loss function and ensure it runs successfully": `gradient` can't access the custom loss function

5 次查看(过去 30 天)
I am trying to build a custom reinforcement learning environment with multiple agents having their own policy network for a project, and I have stuck in the training part (trying to follow a similar approach with this example)
My policy network accepts an array of size 21 as input and outputs a single element from [-1, 0, 1].
I have the following code (multiple-file code shortened into a single file; sorry for the mess):
clear
close all
%% Model parameters
T_init = 0;
T_final = 100;
dt = 1;
rng("shuffle")
baseEnv = baseEnvironment();
p1_pos = randi(baseEnv.L,1);
p2_pos = randi(baseEnv.L,1);
while p1_pos == p2_pos
p2_pos = randi(baseEnv.L,1);
end
rng("shuffle")
baseEnv = baseEnvironment();
% validateEnvironment(baseEnv)
p1_pos = randi(baseEnv.L,1);
p2_pos = randi(baseEnv.L,1);
while p1_pos == p2_pos
p2_pos = randi(baseEnv.L,1);
end
agent1 = IMAgent(baseEnv, p1_pos, 1, 'o');
agent2 = IMAgent(baseEnv, p2_pos, 2, 'x');
listOfAgents = [agent1; agent2];
multiAgentEnv = multiAgentEnvironment(listOfAgents);
%
actInfo = getActionInfo(baseEnv);
obsInfo = getObservationInfo(baseEnv);
%%build the agent1
actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(24,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(24,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(numel(actInfo.Elements),'Name','output')
softmaxLayer('Name','actionProb')];
actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,...
obsInfo,actInfo,'Observation','state',actorOpts);
actor = setLoss(actor, @actorLossFunction);
%obj.brain = rlPGAgent(actor,baseline,agentOpts);
agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99);
agent1.brain = rlPGAgent(actor,agentOpts);
%%build the agent2
actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(24,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(24,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(numel(actInfo.Elements),'Name','output')
softmaxLayer('Name','actionProb')];
actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
actor = rlStochasticActorRepresentation(actorNetwork,...
obsInfo,actInfo,'Observation','state',actorOpts);
actor = setLoss(actor, @actorLossFunction);
%obj.brain = rlPGAgent(actor,baseline,agentOpts);
agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99);
agent2.brain = rlPGAgent(actor,agentOpts);
%%
averageGrad = [];
averageSqGrad = [];
learnRate = 0.05;
gradDecay = 0.75;
sqGradDecay = 0.95;
numOfEpochs = 1;
numEpisodes = 5000;
maxStepsPerEpisode = 250;
discountFactor = 0.995;
aveWindowSize = 100;
trainingTerminationValue = 220;
loss_history = [];
for i = 1:numOfEpochs
action_hist = [];
reward_hist = [];
observation_hist = [multiAgentEnv.baseEnv.state];
for t = T_init:1:T_final
actionList = multiAgentEnv.act();
[observation, reward, multiAgentEnv.isDone, ~] = multiAgentEnv.step(actionList);
if t == T_final
multiAgentEnv.isDone = true;
end
action_hist = cat(3, action_hist, actionList);
reward_hist = cat(3, reward_hist, reward);
if multiAgentEnv.isDone == true
break
else
observation_hist = cat(3, observation_hist, observation);
end
end
if size(observation_hist,3) ~= size(action_hist,3)
print("gi")
end
clear observation reward
actor = getActor(agent1.brain);
batchSize = min(t,maxStepsPerEpisode);
observations = observation_hist;
actions = action_hist(1,:,:);
rewards = reward_hist(1,:,:);
observationBatch = permute(observations(:,:,1:batchSize), [2,1,3]);
actionBatch = actions(:,:,1:batchSize);
rewardBatch = rewards(:,1:batchSize);
discountedReturn = zeros(1,int32(batchSize));
for t = 1:batchSize
G = 0;
for k = t:batchSize
G = G + discountFactor ^ (k-t) * rewardBatch(k);
end
discountedReturn(t) = G;
end
lossData.batchSize = batchSize;
lossData.actInfo = actInfo;
lossData.actionBatch = actionBatch;
lossData.discountedReturn = discountedReturn;
% 6. Compute the gradient of the loss with respect to the policy
% parameters.
actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData);
p1_pos = randi(baseEnv.L,1);
p2_pos = randi(baseEnv.L,1);
while p1_pos == p2_pos
p2_pos = randi(baseEnv.L,1);
end
multiAgentEnv.reset([p1_pos; p2_pos]);
end
function loss = actorLossFunction(policy, lossData)
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Elements',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
When I run the code, I am getting the following error:
Error using rl.representation.rlAbstractRepresentation/gradient (line 181)
Unable to compute gradient from representation.
Error in main1 (line 154)
actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData);
Caused by:
Unable to evaluate the loss function. Check the loss function and ensure it runs successfully.
Reference to non-existent field 'Advantage'.
I also tried running the example in the link; it works, but not my code. I put a breakpoint the loss function, but it isn't called during the gradient calculation, and from the error message, I suspect this is the problem, but the thing is it works when I run the code of the example in mathworks' website.

采纳的回答

Anh Tran
Anh Tran 2020-7-6
In the training loop, you collect the actor from agent.brain, which is an rlPGAgent. The actor, thus, used the loss function defined inside rlPGAgent and not your loss function, actorLossFunction. I believe you can bypass rlPGAgent creation and use actor representation throughout your custom training loop.
To be precise, the actor used inside agent1.brain overides your loss function with a different one.
agent1.brain = rlPGAgent(actor,agentOpts);
If you still have difficulty, feel free to include reprodution scripts so I can further help.

更多回答(0 个)

产品


版本

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by