How to change architecture of conditional GAN to generate 224x224x3 images?

7 次查看(过去 30 天)
This example is for image size 64x64x3. I am wondering what changes should be done in layersGenerator and layersDiscriminator to generate 224x224x3 images.
This is my code:
inputSize = [224 224 3] or [256 256 3];
Note if Factor=2 (below) then I get image size 128x128x3. If Factor=4 then generated image size if 256x256x3. However, during the loop, it gives an error that trainedVariance is negative.
inputSize = [64 64 3];
Factor = 4; %if Factor =2 then 128x128x3 image size is generated;
inputSize = Factor*inputSize(1:2);
numClasses = 2;
augimds = augmentedImageDatastore(inputSize(1:2),XTrain,YTrain);
augimdsValidation = augmentedImageDatastore(inputSize(1:2),XValidation,YValidation);
numLatentInputs = 100;%100
embeddingDimension = 50;
numFilters = Factor*64;%224;
filterSize = 5;
projectionSize = Factor*[4 4 1024];
layersGenerator = [
featureInputLayer(numLatentInputs)
fullyConnectedLayer(prod(projectionSize))
functionLayer(@(X) feature2image(X,projectionSize),Formattable=true)
concatenationLayer(3,2,Name="cat");
transposedConv2dLayer(filterSize,4*numFilters,Stride=2,Cropping="same")
batchNormalizationLayer
reluLayer
transposedConv2dLayer(filterSize,2*numFilters,Stride=2,Cropping="same")
batchNormalizationLayer
reluLayer
transposedConv2dLayer(filterSize,numFilters,Stride=2,Cropping="same")
batchNormalizationLayer
reluLayer
transposedConv2dLayer(filterSize,3,Stride=2,Cropping="same")
tanhLayer];
lgraphGenerator = layerGraph(layersGenerator);
layers = [
featureInputLayer(1)
embeddingLayer(embeddingDimension,numClasses)
fullyConnectedLayer(prod(projectionSize(1:2)))
functionLayer(@(X) feature2image(X,[projectionSize(1:2) 1]),Formattable=true,Name="emb_reshape")];
lgraphGenerator = addLayers(lgraphGenerator,layers);
lgraphGenerator = connectLayers(lgraphGenerator,"emb_reshape","cat/in2");
netG = dlnetwork(lgraphGenerator);
dropoutProb = 0.75;
%numFilters = 64;
scale = 0.2;
filterSize = 5;
layersDiscriminator = [
imageInputLayer(inputSize,Normalization="none")
dropoutLayer(dropoutProb)
concatenationLayer(3,2,Name="cat")
convolution2dLayer(filterSize,numFilters,Stride=2,Padding="same")
leakyReluLayer(scale)
convolution2dLayer(filterSize,2*numFilters,Stride=2,Padding="same")
batchNormalizationLayer
leakyReluLayer(scale)
convolution2dLayer(filterSize,4*numFilters,Stride=2,Padding="same")
batchNormalizationLayer
leakyReluLayer(scale)
convolution2dLayer(filterSize,8*numFilters,Stride=2,Padding="same")
batchNormalizationLayer
leakyReluLayer(scale)
convolution2dLayer(Factor*4,1)];
lgraphDiscriminator = layerGraph(layersDiscriminator);
layers = [
featureInputLayer(1)
embeddingLayer(embeddingDimension,numClasses)
fullyConnectedLayer(prod(inputSize(1:2)))
functionLayer(@(X) feature2image(X,[inputSize(1:2) 1]),Formattable=true,Name="emb_reshape")];
lgraphDiscriminator = addLayers(lgraphDiscriminator,layers);
lgraphDiscriminator = connectLayers(lgraphDiscriminator,"emb_reshape","cat/in2");
netD = dlnetwork(lgraphDiscriminator);
However, the above code gives an error at
[~,~,gradientsG,gradientsD,stateG,scoreG,scoreD] = ...
dlfeval(@modelLoss2,netG,netD,X,T,Z,flipFactor);
The size of generated image at
[XGenerated,stageG] = forward(netG,Z,T);
is 256x256x3. However, an error comes stating that trainedVariance is not positive
Could you assist me which transposedConv2dLayer to change to adjust the size to 224x224x3 or 256x256x3?
Thanks for your help

回答(1 个)

Ayush Aniket
Ayush Aniket 2025-5-9
If you use a projection size that doesn't align with the upsampling path, the generator output won't match the expected image size, which can cause downstream errors (such as negative variance or shape mismatches).
Each transposedConv2dLayer with Stride=2 doubles the spatial resolution. The number of upsampling layers and the initial projection size must align so that after all upsampling, you reach your desired output size.The general rule is that if your initial projection size is [h, w, c] and you have n upsampling layers (each with Stride=2), your output size will be [h*2^n, w*2^n, outputChannels]. Therefore, for
1. 256x256x3 Output -
  • Start with: [4, 4, ...] projection size
  • Number of upsampling layers: 4
  • Calculation: 4 → 8 → 16 → 32 → 64 → 128 → 256 (for 6 layers, but typically 4 layers from 4 to 64, then up to 256)
  • But: 4 upsampling layers from [4,4] gives [64,64]`\, so you need 6 layers to go from 4 to 256.
  • However, your code uses 4 upsampling layers, so your projection should be [16,16, ...] for 256x256 output: 16 → 32 → 64 → 128 → 256 (4 layers, 16*2^4 = 256)
2. 224x224x3 Output -
  • 224 is not a power of 2, so you need to start with a projection size that, after upsampling, results in 224.
  • 224 = 14 * 2^4
  • So, start with [14,14, ...] and 4 upsampling layers.

类别

Help CenterFile Exchange 中查找有关 Deep Learning Toolbox 的更多信息

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by