Custom deep learning network - gradient function using dlfeval

9 次查看(过去 30 天)
I want to create a custom deep learning training function, the output of which is an array Y. I have two inputs, the arrays X1 and X2. I want to find the gradient of Y with respect to X1 and X2.
This is my network:
layers1 = [
sequenceInputLayer(sizeInput,"Name","XTrain1")
fullyConnectedLayer(numHiddenDimension,"Name","fc_1")
softplusLayer('Name','s_1')];
layers2 = [
sequenceInputLayer(sizeInput,"Name","XTrain2")
fullyConnectedLayer(numHiddenDimension,"Name","fc_2")
softplusLayer('Name','s_2')];
lgraph = layerGraph(layers1);
lgraph = addLayers(lgraph,layers2); % connect layers -> 2 in, 1 out
add = additionLayer(2,'Name','add');
lgraph = addLayers(lgraph,add);
lgraph = connectLayers(lgraph,'s_1','add/in1');
lgraph = connectLayers(lgraph,'s_2','add/in2');
fc = fullyConnectedLayer(sizeInput,"Name","fc_3");
lgraph = addLayers(lgraph,fc);
lgraph = connectLayers(lgraph,'add','fc_3');
dlnet = dlnetwork(lgraph);
My should become my output. Then every iteration, I do:
dlX1 = dlarray(X1,'CTB');
dlX2 = dlarray(X2,'CTB');% to differentiate: dlarray/dlgradient
for i = 1:sizeInput
[gradx1(i), gradx2(i), dlY] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i)); % here is where I get my error
end
and I call my function , which is supposed to get the derivative of my output with respect to my inputs.
function [gradx1, gradx2, dlY] = modelGradientsX(dlnet,dlX1,dlX2)
dlY = forward(dlnet,dlX1,dlX2);
[gradx1, gradx2] = dlgradient(dlY,dlX1,dlX2);
end
And the error I get is: "Input data must be formatted dlarray objects". I have seen similar approaches in other examples (like this one: https://www.mathworks.com/matlabcentral/fileexchange/74760-image-classification-using-cnn-with-multi-input-cnn) so I don't understand - why is not the correct type of data?

采纳的回答

Raunak Gupta
Raunak Gupta 2020-7-18
Hi,
From the code I only see a syntax error on the following line
[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));
Here the modelGradientsX outputs three variables but you have assigned only gradx1 and gradx2 while calling it. This may be one issue. Other than that, I think loss should also be returned from the modelGradientsX function so that for next iteration the weights can be updated.
If still the error persist you may check that dlX1(i) and dlX2(i) are indeed a dlarray object because dlgradient only accept dlarray object.
  2 个评论
Iris Soa
Iris Soa 2020-7-19
编辑:Iris Soa 2020-7-26
Sir,
Thank you very much for your answer. I will reply to each of the ideas in turn:
  • On the line that you have emphasised
[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));
unfortunately I have just provided the incorrect code. In fact I am sending back three outputs. I have updated the issue I have opened to reflect this.
  • I see now that I should get a loss returned from my function, thank you very much for this. I think this is my problem. Thank you.
Iris Soa
Iris Soa 2020-7-23
Here is the code that I am using to compare with my own, and it works for some reason...
iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
% Shuffle data.
idx = randperm(numel(YTrain));
XTrain1 = XTrain1(:,:,:,idx);
XTrain2 = XTrain2(:,:,:,idx);
YTrain = YTrain(idx);
% Loop over mini-batches.
for i = 1:numIterationsPerEpoch
iteration = iteration + 1;
% Read mini-batch of data and convert the labels to dummy
% variables.
idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
X1 = XTrain1(:,:,:,idx);
X2 = XTrain2(:,:,:,idx);
% convert the label into one-hot vector to calculate the loss
Y = zeros(numClasses, miniBatchSize, 'single');
for c = 1:numClasses
Y(c,YTrain(idx)==classes(c)) = 1;
end
% Convert mini-batch of data to dlarray.
dlX1 = dlarray(single(X1),'SSCB');
dlX2 = dlarray(single(X2),'SSCB');
% If training on a GPU, then convert data to gpuArray.
if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
dlX1 = gpuArray(dlX1);
dlX2 = gpuArray(dlX2);
end
%the traning loss and the gradients after the backpropagation were
%calculated using the helper function modelGradients_demo
% --------------- below: call to my dlfeval function, working -----------------
[gradients1,gradients2,gradients3,loss] = dlfeval(@modelGradients_demo,dlnet1,dlnet2,dlnet3,dlX1,dlX2,dlarray(Y));
% -----------------------------------------------------------------------------
learnRate = initialLearnRate/(1 + decay*iteration);
% Update the network parameters using the SGDM optimizer.
% Update the parameters in dlnet1 to 3 sequentially
[dlnet3.Learnables, velocity3] = sgdmupdate(dlnet3.Learnables, gradients3, velocity3, learnRate, momentum);
[dlnet2.Learnables, velocity2] = sgdmupdate(dlnet2.Learnables, gradients2, velocity2, learnRate, momentum);
[dlnet1.Learnables, velocity1] = sgdmupdate(dlnet1.Learnables, gradients1, velocity1, learnRate, momentum);
% Display the training progress.
D = duration(0,0,toc(start),'Format','hh:mm:ss');
addpoints(lineLossTrain,iteration,double(gather(extractdata(loss))))
title("Epoch: " + epoch + ", Elapsed: " + string(D))
drawnow
end
end
function dlnet=createLayer(XTrain,numHiddenDimension)
layers = [
imageInputLayer([14 28 1],"Name","imageinput","Mean",mean(XTrain,4))
convolution2dLayer([3 3],8,"Name","conv_1","Padding","same")
batchNormalizationLayer("Name","batchnorm_1")
reluLayer("Name","relu_1")
maxPooling2dLayer([2 2],"Name","maxpool_1","Stride",[2 2])
convolution2dLayer([3 3],16,"Name","conv_2","Padding","same")
batchNormalizationLayer("Name","batchnorm_2")
reluLayer("Name","relu_2")
maxPooling2dLayer([2 2],"Name","maxpool_2","Stride",[2 2])
convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")
batchNormalizationLayer("Name","batchnorm_3")
reluLayer("Name","relu_3")
fullyConnectedLayer(numHiddenDimension,"Name","fc")];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
end
function dlnet=createLayerFullyConnect(numHiddenDimension)
layers = [
imageInputLayer([1 numHiddenDimension*2 1],"Name","imageinput","Normalization","none")
fullyConnectedLayer(20,"Name","fc_1")
fullyConnectedLayer(10,"Name","fc_2")];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
end
% ----------------- below - the function called by dlfeval, working --------------------
function [gradients1,gradients2,gradients3, loss] = modelGradients_demo(dlnet1,dlnet2,dlnet3,dlX1,dlX2,Y)
dlYPred1 = forward(dlnet1,dlX1);
dlYPred2 = forward(dlnet2,dlX2);
dlX_concat=[dlYPred1;dlYPred2];
dlX_concat=reshape(dlX_concat,[1 40, 1, 128]);%the value 128 corresponds the mini batch size
dlX_concat=dlarray(single(dlX_concat),'SSCB');
dlY_concat=forward(dlnet3,dlX_concat);
dlYPred_concat = softmax(dlY_concat);
loss = crossentropy(dlYPred_concat,Y);
[gradients1,gradients2,gradients3] = dlgradient(loss,dlnet1.Learnables,dlnet2.Learnables,dlnet3.Learnables);
end

请先登录,再进行评论。

更多回答(1 个)

Iris Soa
Iris Soa 2020-7-27
Derivative Trace
To evaluate a gradient numerically, a dlarray constructs a data structure for reverse mode differentiation, as described in Automatic Differentiation Background. This data structure is the trace of the derivative computation. Keep in mind these guidelines when using automatic differentiation and the derivative trace:
  • Do not introduce a new dlarray inside of an objective function calculation and attempt to differentiate with respect to that object. For example:function [dy,dy1] = fun(x1)
function [dy,dy1] = fun(x1)
x2 = dlarray(0);
y = x1 + x2;
dy = dlgradient(y,x2); % Error: x2 is untraced
dy1 = dlgradient(y,x1); % No error even though y has an untraced portion
end

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by