How to change architecture of conditional GAN to generate 224x224x3 images?

Question

Alok 2022-8-4

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1773905-how-to-change-architecture-of-conditional-gan-to-generate-224x224x3-images

回答： Ayush Aniket 2025-5-9

I am following matlab example on conditional GAN at https://www.mathworks.com/help/deeplearning/ug/train-conditional-generative-adversarial-network.html

This example is for image size 64x64x3. I am wondering what changes should be done in layersGenerator and layersDiscriminator to generate 224x224x3 images.

This is my code:

inputSize = [224 224 3] or [256 256 3];

Note if Factor=2 (below) then I get image size 128x128x3. If Factor=4 then generated image size if 256x256x3. However, during the loop, it gives an error that trainedVariance is negative.

inputSize = [64 64 3];
Factor = 4; %if Factor =2 then 128x128x3 image size is generated; 
inputSize = Factor*inputSize(1:2);
numClasses = 2;
augimds = augmentedImageDatastore(inputSize(1:2),XTrain,YTrain);
augimdsValidation = augmentedImageDatastore(inputSize(1:2),XValidation,YValidation);
numLatentInputs = 100;%100
embeddingDimension = 50;
numFilters = Factor*64;%224;
filterSize = 5;
projectionSize = Factor*[4 4 1024];
layersGenerator = [
    featureInputLayer(numLatentInputs)
    fullyConnectedLayer(prod(projectionSize))
    functionLayer(@(X) feature2image(X,projectionSize),Formattable=true)
    concatenationLayer(3,2,Name="cat");
    transposedConv2dLayer(filterSize,4*numFilters,Stride=2,Cropping="same")
    batchNormalizationLayer
    reluLayer
    transposedConv2dLayer(filterSize,2*numFilters,Stride=2,Cropping="same")
    batchNormalizationLayer
    reluLayer
    transposedConv2dLayer(filterSize,numFilters,Stride=2,Cropping="same")
    batchNormalizationLayer
    reluLayer
    transposedConv2dLayer(filterSize,3,Stride=2,Cropping="same")
    tanhLayer];
lgraphGenerator = layerGraph(layersGenerator);
layers = [
    featureInputLayer(1)
    embeddingLayer(embeddingDimension,numClasses)
    fullyConnectedLayer(prod(projectionSize(1:2)))
    functionLayer(@(X) feature2image(X,[projectionSize(1:2) 1]),Formattable=true,Name="emb_reshape")];
lgraphGenerator = addLayers(lgraphGenerator,layers);
lgraphGenerator = connectLayers(lgraphGenerator,"emb_reshape","cat/in2");
netG = dlnetwork(lgraphGenerator);
dropoutProb = 0.75;
%numFilters = 64;
scale = 0.2;
filterSize = 5;
layersDiscriminator = [
    imageInputLayer(inputSize,Normalization="none")
    dropoutLayer(dropoutProb)
    concatenationLayer(3,2,Name="cat")
    convolution2dLayer(filterSize,numFilters,Stride=2,Padding="same")
    leakyReluLayer(scale)
    convolution2dLayer(filterSize,2*numFilters,Stride=2,Padding="same")
    batchNormalizationLayer
    leakyReluLayer(scale)
    convolution2dLayer(filterSize,4*numFilters,Stride=2,Padding="same")
    batchNormalizationLayer
    leakyReluLayer(scale)
    convolution2dLayer(filterSize,8*numFilters,Stride=2,Padding="same")
    batchNormalizationLayer
    leakyReluLayer(scale)
    convolution2dLayer(Factor*4,1)];
lgraphDiscriminator = layerGraph(layersDiscriminator);
layers = [
    featureInputLayer(1)
    embeddingLayer(embeddingDimension,numClasses)
    fullyConnectedLayer(prod(inputSize(1:2)))
    functionLayer(@(X) feature2image(X,[inputSize(1:2) 1]),Formattable=true,Name="emb_reshape")];
lgraphDiscriminator = addLayers(lgraphDiscriminator,layers);
lgraphDiscriminator = connectLayers(lgraphDiscriminator,"emb_reshape","cat/in2");
netD = dlnetwork(lgraphDiscriminator);

However, the above code gives an error at

[~,~,gradientsG,gradientsD,stateG,scoreG,scoreD] = ...
            dlfeval(@modelLoss2,netG,netD,X,T,Z,flipFactor);

The size of generated image at

[XGenerated,stageG] = forward(netG,Z,T);

is 256x256x3. However, an error comes stating that trainedVariance is not positive

Could you assist me which transposedConv2dLayer to change to adjust the size to 224x224x3 or 256x256x3?

Thanks for your help

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Ayush Aniket 2025-5-9

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1773905-how-to-change-architecture-of-conditional-gan-to-generate-224x224x3-images#answer_1564956

If you use a projection size that doesn't align with the upsampling path, the generator output won't match the expected image size, which can cause downstream errors (such as negative variance or shape mismatches).

Each transposedConv2dLayer with Stride=2 doubles the spatial resolution. The number of upsampling layers and the initial projection size must align so that after all upsampling, you reach your desired output size.The general rule is that if your initial projection size is [h, w, c] and you have n upsampling layers (each with Stride=2), your output size will be [h*2^n, w*2^n, outputChannels]. Therefore, for

1. 256x256x3 Output -

Start with: [4, 4, ...] projection size
Number of upsampling layers: 4
Calculation: 4 → 8 → 16 → 32 → 64 → 128 → 256 (for 6 layers, but typically 4 layers from 4 to 64, then up to 256)
But: 4 upsampling layers from [4,4] gives [64,64]`\, so you need 6 layers to go from 4 to 256.
However, your code uses 4 upsampling layers, so your projection should be [16,16, ...] for 256x256 output: 16 → 32 → 64 → 128 → 256 (4 layers, 16*2^4 = 256)

2. 224x224x3 Output -

224 is not a power of 2, so you need to start with a projection size that, after upsampling, results in 224.
224 = 14 * 2^4
So, start with [14,14, ...] and 4 upsampling layers.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to change architecture of conditional GAN to generate 224x224x3 images?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to change architecture of conditional GAN to generate 224x224x3 images?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论