Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

Question

Arman Ahmadian 2022-1-18

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1630950-can-automatic-differentiation-in-a-custom-deep-learning-layer-keep-track-of-the-random-numbers-gener

回答： Katja Mogalle 2022-1-21

I'm trying to create a gating neural network (NN) to use in a Mixture of Experts (MoE) settings a schematic of similar to what is shown bellow.

The MOE network will output probabilities of selecting each expert and the gate network (that I'm building) will pick one expert based on those probabilities stochastically at training time (only).

Since the behavior of the gate network is stochastic at training time, its forward function will generate a random vector every time it is evoked.

My understanding is that I also have to keep track of this random vector and use it in a backward function since if I leave the job of the backward function to automatic differentiation another random number will be generated at backpropagation and hence will ruin my training. (Right?)

My problem is that I'm not sure how I can keep track of this random vector. There are 3 possibilities in my opinion

Create an ordinary properties for the random number so that, this number can be recalled each time the backward function is called. My attempts to do this have, so far, failed as it seems that custom NN layers strangly do not keep the property as the program runs. (Maybe it is due to the fact that such objects aren't handle objects?)
Use the memory property of the custom layers. This is not allowed by the compiler as it seems using memory in dlnetworks is not permitted for some reason!
Use a state property. In such a case, I will also have to provide derivatives of such a state property. However, I do not want the compiler to make any changes to the state so providing the derivative is meaningless in this case.

How can I solve this problem?

Thanks.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Katja Mogalle 2022-1-21

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1630950-can-automatic-differentiation-in-a-custom-deep-learning-layer-keep-track-of-the-random-numbers-gener#answer_879215

在 MATLAB Online 中打开

The automatic differentiation framework stores the actual random numbers generated during the forward pass and uses them directly during the backward pass. So you shouldn't have to do anything special and you can make use of automatic differentiation in your custom layer (by not defining your own backward function).

You can also read a bit more about automatic differentiation in MATLAB here: https://www.mathworks.com/help/deeplearning/ug/deep-learning-with-automatic-differentiation-in-matlab.html

There it says: "In other words, automatic differentiation evaluates derivatives at particular numeric values; it does not construct symbolic expressions for derivatives." Maybe this piece of information helps with understanding the behaviour when using random numbers.

I also put together a small example to illustrate what I mean. It is quite simplified from your example but hopefully you can use it to better understand or play around with autodiff framework and can transfer the idea to your implementation:

% Construct a simple network with some learnable layers and a custom layer
% in the middle which sets some channels of the data to zero.
layers = [ featureInputLayer(10)
    fullyConnectedLayer(5,Name="fc1")
    randomChannelDropLayer(5,"channelDrop")
    fullyConnectedLayer(1,Name="fc2")];
net = dlnetwork(layers);
in = dlarray(rand(10,3),'CB');
% Now let's compute gradients. Note that the custom layer does not specify
% a backward function and hence automatic differentiation is used.
for i=1:5
    disp("Execution #"+i)
    % Every time we do a forward pass a different channel is dropped. The
    % gradient of the custom layer's output with respect to its input
    % contains zeros in the same channel that was randomly dropped during
    % forward.
    [layerOutput,layerGrad] = dlfeval(@customLayerGradients,net,in)
end
function [layerOutput,grad] = customLayerGradients(net,in)
% Compute gradients of the custom layer output with respect to its input.
% This gradient is used for backpropagation through the whole network.
[layerInput,layerOutput] = net.forward(in,Outputs=["fc1","channelDrop"]);
combinedOutput = sum(layerOutput,'all');
grad = dlgradient(combinedOutput,layerInput);
end

And here is the definition of the custom layer:

classdef randomChannelDropLayer < nnet.layer.Layer
    % randomChannelDropLayer sets one randomly selected input channel to
    % all zeros during training. The data passes through the layer
    % unchanged during prediction.
    properties
        NumChannels
    end
    methods
        function layer = randomChannelDropLayer(numChannels,name)
            layer.NumChannels = numChannels;
            layer.Name = name;
        end
        function Y = forward(layer,X)
            channelToDrop = randi(layer.NumChannels,1);
            Y = X;
            Y(channelToDrop,:,:) = 0;
        end
        function Y = predict(~,X)
            Y = X;
        end
    end
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Can automatic differentiation in a custom deep learning layer keep track of the random numbers generated in the forward function of the layer?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论