Train VAE for RGB image generation

Question

debojit sharma 2023-6-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1984654-train-vae-for-rgb-image-generation

评论： Ben 2023-6-26

I am trying to implement the code to train VAE for image generation given in the following link using my own dataset of RGB images of size 200*200. https://in.mathworks.com/help/deeplearning/ug/train-a-variational-autoencoder-vae-to-generate-images.html

I am getting the following errors in the Train model part:

The code of VAE in the above link is using MNIST dataset images as input to encoder of VAE and it is being said that the decoder of VAE will output an image of size 28-by-28-by-1. But I am trying to generate RGB image of size 200*200 by training this VAE model given in the link. So, my input image is a RGB image of size 200*200. I am getting the above mentioned error in the train model part. I am not able to resolve these errors. So, somebody please kindly guide me regarding what changes I will have to make in this code so that I can train these VAE model to generate RGB image of size 200*200. I will be thankful to you.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Ben 2023-6-23

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1984654-train-vae-for-rgb-image-generation#answer_1261399

The error is stating that the VAE outputs Y and the training images T are different sizes when you try to compute the mean-squared error mse loss between them.

Note that the VAE output size is determined by both the input image sizes and the layers in the network. I think there are a few things to check first:

Make sure the output of the VAE has the same number of channels as the target images - for the MNIST example this will be 1, for RGB images it would be 3.
Make sure the VAE output has the same height and width as the target images, 200x200. The VAE in the example downsamples the spatial sizes by using Stride=2 in the two convolution layers of the encoder, then upsamples again using Stride=2 with the two transposed convolution layers in the decoder. You have to be careful to ensure the decoder upsamples back to the original image size.
Ensure the custom projectAndReshapeLayer is configured for your encoder latent size - in the example the projectionSize is [7,7,64] but for the same network on 200x200 images I would expect this needs to be [50,50,64].

If you can't get this working could you let us know if you have modified the encoder or decoder layers at all? If not can you ensure that all the images input to the VAE have the same size?

Hope that helps,

Ben

3 个评论
显示 1更早的评论隐藏 1更早的评论

debojit sharma 2023-6-24

@Ben sir, I tried my making the changes suggested by you in the code given in link: https://in.mathworks.com/help/deeplearning/ug/train-a-variational-autoencoder-vae-to-generate-images.html

But still I am getting the following errors in the train model part.

Size of all the images input to VAE model is same i.e 200*200*3. I am not able to resolve these errors. So, please kindly guide me regarding what changes I will have to make in this code so that I can train these VAE model to generate RGB image of size 200*200. I will be thankful to you.

I have kept input images of different classes in a folder. Then I am preparing my input image dataset for VAE using the following code:

digitDatasetPath = fullfile(matlabdrive,'Training_sample');

imds = imageDatastore(digitDatasetPath, ...

'IncludeSubfolders',true,'LabelSource','foldernames');

labelCount = countEachLabel(imds)

img = readimage(imds,1);

size(img)

numTrainFiles = .75;

[imdsTrain,imdsValidation] = splitEachLabel(imds,numTrainFiles,'randomize');

[XTrain,tTrain] = imds2cell(imds );

XTrain = cell2mat(reshape(XTrain,1,1,1,[]));

XTrain = XTrain./255;

XTrain = reshape(XTrain,200,200,3,[]);

Thereafter, I made the above mentioned changes in the encoder and decoder part. But still I am getting the same error. Please suggest some solution for this @Ben

Aniketh 2023-6-25

Have you tried printing the dimensions of the arguments being passed to the loss calculator dlfeval(), the upsampling, downsampling and projection corrections pointed out by Ben should solve your issue, however the exact difference in the output dimensions of the layersE and layersD should point you to the correct direction.

Ben 2023-6-26

@debojit sharma - I've written some code showing how this could work for 200x200x3 images. I noticed the main issue I had was that numInputChannels in the example is computed wrong, so perhaps that is the issue you are having. I fixed that in the below:

numLatentChannels = 16;

imageSize = [200 200 3]; % updated for 200x200x3 images

layersE = [

imageInputLayer(imageSize,Normalization="none")

convolution2dLayer(3,32,Padding="same",Stride=2)

reluLayer

convolution2dLayer(3,64,Padding="same",Stride=2)

reluLayer

fullyConnectedLayer(2*numLatentChannels)

samplingLayer];

projectionSize = [50 50 64]; % recomputed manually

numInputChannels = imageSize(3); % fixed from the example.

layersD = [

featureInputLayer(numLatentChannels)

projectAndReshapeLayer(projectionSize)

transposedConv2dLayer(3,64,Cropping="same",Stride=2)

reluLayer

transposedConv2dLayer(3,32,Cropping="same",Stride=2)

reluLayer

transposedConv2dLayer(3,numInputChannels,Cropping="same")

sigmoidLayer];

netE = dlnetwork(layersE);

netD = dlnetwork(layersD);

% Test forward

batchSize = 5;

imageBatch = dlarray(randn([imageSize,batchSize]),"SSCB");

latentBatch = forward(netE,imageBatch);

size(latentBatch)

generatedBatch = forward(netD,latentBatch);

size(generatedBatch)

% Test loss and gradients

if canUseGPU

netE = dlupdate(@gpuArray,netE);

netD = dlupdate(@gpuArray,netD);

imageBatch = gpuArray(imageBatch);

end

[loss,gradE,gradD] = dlfeval(@modelLoss,netE,netD,imageBatch);

function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X)

% Forward through encoder.

[Z,mu,logSigmaSq] = forward(netE,X);

% Forward through decoder.

Y = forward(netD,Z);

% Calculate loss and gradients.

loss = elboLoss(Y,X,mu,logSigmaSq);

[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);

end

function loss = elboLoss(Y,T,mu,logSigmaSq)

% Reconstruction loss.

reconstructionLoss = mse(Y,T);

% KL divergence.

KL = -0.5 * sum(1 + logSigmaSq - mu.^2 - exp(logSigmaSq),1);

KL = mean(KL);

% Combined loss.

loss = reconstructionLoss + KL;

end

Hope that helps.

请先登录，再进行评论。

Train VAE for RGB image generation

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

Train VAE for RGB image generation

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

3 个评论 显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论