Custom deep learning network - gradient function using dlfeval

Question

Iris Soa 2020-7-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/565595-custom-deep-learning-network-gradient-function-using-dlfeval

回答： Iris Soa 2020-7-27

I want to create a custom deep learning training function, the output of which is an array Y. I have two inputs, the arrays X1 and X2. I want to find the gradient of Y with respect to X1 and X2.

This is my network:

layers1 = [
    sequenceInputLayer(sizeInput,"Name","XTrain1")
    fullyConnectedLayer(numHiddenDimension,"Name","fc_1")
    softplusLayer('Name','s_1')];
layers2 = [
    sequenceInputLayer(sizeInput,"Name","XTrain2")
    fullyConnectedLayer(numHiddenDimension,"Name","fc_2")
    softplusLayer('Name','s_2')];
lgraph = layerGraph(layers1); 
lgraph = addLayers(lgraph,layers2); % connect layers -> 2 in, 1 out
add = additionLayer(2,'Name','add');
lgraph = addLayers(lgraph,add); 
lgraph = connectLayers(lgraph,'s_1','add/in1');
lgraph = connectLayers(lgraph,'s_2','add/in2');
fc = fullyConnectedLayer(sizeInput,"Name","fc_3");
lgraph = addLayers(lgraph,fc);
lgraph = connectLayers(lgraph,'add','fc_3');
dlnet = dlnetwork(lgraph);

My

should become my output. Then every iteration, I do:

dlX1 = dlarray(X1,'CTB'); 
dlX2 = dlarray(X2,'CTB');% to differentiate: dlarray/dlgradient
for i = 1:sizeInput
    [gradx1(i), gradx2(i), dlY] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i)); % here is where I get my error
end

and I call my function

, which is supposed to get the derivative of my output with respect to my inputs.

function [gradx1, gradx2, dlY] = modelGradientsX(dlnet,dlX1,dlX2)
    dlY = forward(dlnet,dlX1,dlX2); 
    [gradx1, gradx2] = dlgradient(dlY,dlX1,dlX2);
end

And the error I get is: "Input data must be formatted dlarray objects". I have seen similar approaches in other examples (like this one: https://www.mathworks.com/matlabcentral/fileexchange/74760-image-classification-using-cnn-with-multi-input-cnn) so I don't understand - why is

not the correct type of data?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Raunak Gupta 2020-7-18

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/565595-custom-deep-learning-network-gradient-function-using-dlfeval#answer_467433

在 MATLAB Online 中打开

Hi,

From the code I only see a syntax error on the following line

[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));

Here the modelGradientsX outputs three variables but you have assigned only gradx1 and gradx2 while calling it. This may be one issue. Other than that, I think loss should also be returned from the modelGradientsX function so that for next iteration the weights can be updated.

If still the error persist you may check that dlX1(i) and dlX2(i) are indeed a dlarray object because dlgradient only accept dlarray object.

2 个评论
显示无隐藏无

Iris Soa 2020-7-19

编辑：Iris Soa 2020-7-26

在 MATLAB Online 中打开

Sir,

Thank you very much for your answer. I will reply to each of the ideas in turn:

On the line that you have emphasised

[gradx1(i), gradx2(i)] = dlfeval(@modelGradientsX,dlnet,dlX1(i),dlX2(i));

unfortunately I have just provided the incorrect code. In fact I am sending back three outputs. I have updated the issue I have opened to reflect this.

I see now that I should get a loss returned from my function, thank you very much for this. I think this is my problem. Thank you.

Iris Soa 2020-7-23

在 MATLAB Online 中打开

Here is the code that I am using to compare with my own, and it works for some reason...

iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
    % Shuffle data.
    idx = randperm(numel(YTrain));
    XTrain1 = XTrain1(:,:,:,idx);
    XTrain2 = XTrain2(:,:,:,idx);
    YTrain = YTrain(idx);
    
    % Loop over mini-batches.
     for i = 1:numIterationsPerEpoch
        iteration = iteration + 1;
        
        % Read mini-batch of data and convert the labels to dummy
        % variables.
        idx = (i-1)*miniBatchSize+1:i*miniBatchSize;
        X1 = XTrain1(:,:,:,idx);
        X2 = XTrain2(:,:,:,idx);
        % convert the label into one-hot vector to calculate the loss
        Y = zeros(numClasses, miniBatchSize, 'single');
        for c = 1:numClasses
            Y(c,YTrain(idx)==classes(c)) = 1;
        end
        
        % Convert mini-batch of data to dlarray.
        dlX1 = dlarray(single(X1),'SSCB');
        dlX2 = dlarray(single(X2),'SSCB');
        
        % If training on a GPU, then convert data to gpuArray.
        if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
            dlX1 = gpuArray(dlX1);
            dlX2 = gpuArray(dlX2);
        end
        %the traning loss and the gradients after the backpropagation were
        %calculated using the helper function modelGradients_demo
        
        % --------------- below: call to my dlfeval function, working -----------------
        [gradients1,gradients2,gradients3,loss] = dlfeval(@modelGradients_demo,dlnet1,dlnet2,dlnet3,dlX1,dlX2,dlarray(Y));
        % -----------------------------------------------------------------------------
        learnRate = initialLearnRate/(1 + decay*iteration);
        % Update the network parameters using the SGDM optimizer.
        % Update the parameters in dlnet1 to 3 sequentially 
        [dlnet3.Learnables, velocity3] = sgdmupdate(dlnet3.Learnables, gradients3, velocity3, learnRate, momentum);
        [dlnet2.Learnables, velocity2] = sgdmupdate(dlnet2.Learnables, gradients2, velocity2, learnRate, momentum);
        [dlnet1.Learnables, velocity1] = sgdmupdate(dlnet1.Learnables, gradients1, velocity1, learnRate, momentum);
        % Display the training progress.
        D = duration(0,0,toc(start),'Format','hh:mm:ss');
            addpoints(lineLossTrain,iteration,double(gather(extractdata(loss))))
            title("Epoch: " + epoch + ", Elapsed: " + string(D))
            drawnow
    end
end
function dlnet=createLayer(XTrain,numHiddenDimension)
layers = [
    imageInputLayer([14 28 1],"Name","imageinput","Mean",mean(XTrain,4))
    convolution2dLayer([3 3],8,"Name","conv_1","Padding","same")
    batchNormalizationLayer("Name","batchnorm_1")
    reluLayer("Name","relu_1")
    maxPooling2dLayer([2 2],"Name","maxpool_1","Stride",[2 2])
    convolution2dLayer([3 3],16,"Name","conv_2","Padding","same")
    batchNormalizationLayer("Name","batchnorm_2")
    reluLayer("Name","relu_2")
    maxPooling2dLayer([2 2],"Name","maxpool_2","Stride",[2 2])
    convolution2dLayer([3 3],32,"Name","conv_3","Padding","same")
    batchNormalizationLayer("Name","batchnorm_3")
    reluLayer("Name","relu_3")
    fullyConnectedLayer(numHiddenDimension,"Name","fc")];
    lgraph = layerGraph(layers);
    dlnet = dlnetwork(lgraph);
end
function dlnet=createLayerFullyConnect(numHiddenDimension)
    layers = [
        imageInputLayer([1 numHiddenDimension*2 1],"Name","imageinput","Normalization","none")
        fullyConnectedLayer(20,"Name","fc_1")
        fullyConnectedLayer(10,"Name","fc_2")];
    lgraph = layerGraph(layers);
    dlnet = dlnetwork(lgraph);
end
% ----------------- below - the function called by dlfeval, working --------------------
function [gradients1,gradients2,gradients3, loss] = modelGradients_demo(dlnet1,dlnet2,dlnet3,dlX1,dlX2,Y)
    dlYPred1 = forward(dlnet1,dlX1);
    dlYPred2 = forward(dlnet2,dlX2);
    dlX_concat=[dlYPred1;dlYPred2];
    dlX_concat=reshape(dlX_concat,[1 40, 1, 128]);%the value 128 corresponds the mini batch size
    dlX_concat=dlarray(single(dlX_concat),'SSCB');
    dlY_concat=forward(dlnet3,dlX_concat);
    dlYPred_concat = softmax(dlY_concat);
    loss = crossentropy(dlYPred_concat,Y);
    [gradients1,gradients2,gradients3] = dlgradient(loss,dlnet1.Learnables,dlnet2.Learnables,dlnet3.Learnables);
end

请先登录，再进行评论。

Answer 2

Iris Soa 2020-7-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/565595-custom-deep-learning-network-gradient-function-using-dlfeval#answer_471451

在 MATLAB Online 中打开

Update on this issue, see here: https://uk.mathworks.com/help/deeplearning/ug/include-automatic-differentiation.html

Derivative Trace

To evaluate a gradient numerically, a dlarray constructs a data structure for reverse mode differentiation, as described in Automatic Differentiation Background. This data structure is the trace of the derivative computation. Keep in mind these guidelines when using automatic differentiation and the derivative trace:

Do not introduce a new dlarray inside of an objective function calculation and attempt to differentiate with respect to that object. For example:function [dy,dy1] = fun(x1)

function [dy,dy1] = fun(x1)
x2 = dlarray(0);
y = x1 + x2;
dy = dlgradient(y,x2); % Error: x2 is untraced
dy1 = dlgradient(y,x1); % No error even though y has an untraced portion
end