Gradient of loss for variational autoencoder?

Question

Hi, I have the following code for a variational autoencoder. My data is sequence data, not images, so 'Train' consists of ~5,000 univariate sequences, each around 400 observations long. When I run the below code, 'genGrad' is coming up as entirely 0s (not NaNs) and I'm just getting the same loss value every time over multiple epochs. Very unfamiliar with dl in MatLab and not sure where I'm off here.

inputsize = height(Train);
R = 2;
numLatentChannels = 2; 
layersE1 = layerGraph([
    sequenceInputLayer(inputsize,"Name","input",'Normalization','none') 
   fullyConnectedLayer(150*R,"Name","fc_1")    %R can be any number/ factor
    leakyReluLayer(0.01,"Name","leakyrelu_1")
   fullyConnectedLayer(100*R,"Name","fc_2")   
    leakyReluLayer(0.01,"Name","leakyrelu_2")
   fullyConnectedLayer(50*R,"Name","fc_3")   
    leakyReluLayer(0.01,"Name","leakyrelu_3")
   fullyConnectedLayer(25*R,"Name","fc_4")
    leakyReluLayer(0.01,"Name","leakyrelu_4")
   fullyConnectedLayer(10*R,"Name","fc_5")
    leakyReluLayer(0.01,"Name","leakyrelu_5")
   fullyConnectedLayer(5*R,"Name","fc_6")
    leakyReluLayer(0.01,"Name","leakyrelu_6")
   fullyConnectedLayer(2*numLatentChannels)
    ]);
  
%% Decoder
numInputChannels = size(Train,1);
outputsize = height(Train);
layersD = layerGraph([
    sequenceInputLayer(numLatentChannels,"Name","Dinput")
fullyConnectedLayer(5*R,"Name","fc_ou2")
    leakyReluLayer(0.01,"Name","leakyrelu_ou2")
fullyConnectedLayer(10*R,"Name","fc_ou3")
    leakyReluLayer(0.01,"Name","leakyrelu_ou3")
fullyConnectedLayer(25*R,"Name","fc_ou4")
    leakyReluLayer(0.01,"Name","leakyrelu_ou4")
fullyConnectedLayer(50*R,"Name","fc_ou5")
    leakyReluLayer(0.01,"Name","leakyrelu_ou5")
fullyConnectedLayer(100*R,"Name","fc_ou6")
    leakyReluLayer(0.01,"Name","leakyrelu_ou6")
fullyConnectedLayer(150*R,"Name","fc_ou7")
    leakyReluLayer(0.01,"Name","leakyrelu_ou7")
   fullyConnectedLayer(outputsize,"Name","fc_16")
    ]);

%% create networks from layers
encoderNet1 = dlnetwork(layersE1);
decoderNet = dlnetwork(layersD);

%%
miniBatchSize = 64;
numTrainSeq = width(Train);
%Set training options
executionEnvironment = "auto"; % set execution environment
dsTrain = arrayDatastore(Train,IterationDimension=2);
numOutputs = 1;
mbq = minibatchqueue(dsTrain,numOutputs, ...
    MiniBatchSize = miniBatchSize, ...
    MiniBatchFormat="CT",...
    MiniBatchFcn=@preprocessMiniBatch, ...
    PartialMiniBatch="discard");

numEpochs = 50; % Num of epochs
lr = 1e-4; % Learning rate
numIterationsperEpoch = ceil(numTrainSeq/miniBatchSize); % Num of Iteration per epoch
numIterations = numEpochs * numIterationsperEpoch;

avgGradientsEncoder = [];
avgGradientsSquaredEncoder = [];
avgGradientsDecoder = [];
avgGradientsSquaredDecoder = [];
monitor = trainingProgressMonitor( ...
    Metrics="Loss", ...
    Info="Epoch", ...
    XLabel="Iteration");
epoch = 0;
iteration = 0;
%Train the model
while epoch < numEpochs && ~monitor.Stop
    epoch = epoch + 1
    shuffle(mbq);
     while hasdata(mbq) && ~monitor.Stop
        iteration = iteration + 1
        XBatch = next(mbq);
        if (executionEnvironment == "auto" && canUseGPU) || executionEnvironment == "gpu"
            XBatch = gpuArray(XBatch);           
        end 
            compressed = forward(encoderNet1, XBatch);
            d = size(compressed,1)/2;
            zMean = compressed(1:d,:);
            zLogvar = compressed(1+d:end,:);
            sz = size(zMean);
            epsilon = randn(sz);
            sigma = exp(.5 * zLogvar);
            z = epsilon .* sigma + zMean;
            z = reshape(z, [sz]);
            zSampled = dlarray(z, 'CT');
        % calculate gradient of loss
        [infGrad, genGrad] = dlfeval(@modelGradients1, encoderNet1, decoderNet, XBatch, zSampled,zMean,zLogvar);
        % update parameters of Encoder/Decoder
        [decoderNet.Learnables, avgGradientsDecoder, avgGradientsSquaredDecoder] = ...
            adamupdate(decoderNet.Learnables, ...
                genGrad, avgGradientsDecoder, avgGradientsSquaredDecoder, iteration, lr);
        [encoderNet1.Learnables, avgGradientsEncoder, avgGradientsSquaredEncoder] = ...
            adamupdate(encoderNet1.Learnables, ...
                infGrad, avgGradientsEncoder, avgGradientsSquaredEncoder, iteration, lr);
    end
            % Update the training progress monitor. 
        recordMetrics(monitor,iteration,Loss=loss);
        updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
        monitor.Progress = 100*iteration/numIterations;
end

function [infGrad, genGrad] = modelGradients1(encoderNet1, decoderNet, XBatch, zSampled,zMean,zLogvar)
xPred = forward(decoderNet, zSampled);
xPred = dlarray(xPred, 'CT');
loss = elboLoss(XBatch, xPred, zMean, zLogvar);
[genGrad, infGrad] = dlgradient(loss, decoderNet.Learnables, ...
    encoderNet1.Learnables);
end

function elbo = elboLoss(x,xPred,zMean,zLogvar)
reconstructionLoss = mse(x,xPred);    % Reconstruction loss.
KL = -0.5 * sum(1 + zLogvar - zMean.^2 - exp(zLogvar),1);    % KL divergence.
KL = mean(KL);
elbo = reconstructionLoss + KL;    % Combined loss.
end

Richard · Accepted Answer

Zero gradients are normally caused by the computation between the inputs and the output loss not being traced.  When dlgradient cannot see that the loss has a dependency on an input, it always assigns zero gradients for that input.  Only computations that are inside the function that is passed to dlfeval are traced. 
In this case, you have a chunk of code being run outside the dlfeval to compute zSampled, including the forwarding through the encoder.  Try moving that code inside the modelGradients1 function.

Gradient of loss for variational autoencoder?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Gradient of loss for variational autoencoder?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论