Confused about expected conv3dLayer input size
3 次查看（过去 30 天）
I'm trying to use a series of conv3d layers to reduce the third dimension of my data while maintaining the spatial information between them. I've been using the deep network designer and have attached a few pictures of whats going on here. The pictures are kind of large on the browser, so I've attached them rather than posting them to the text box.
I have a 16x16x480 array that I want to eventually reduce to 16x16x1. What I was planning on doing was using a series of 3d convolutions with a step of 1,1,2 to maintain the 16x16 portion while halving the third dimension with each iteration.However, I keep gettignerror messages saying that there is an input size mismatch. I read the documentation for convolution3dLayer and it says that the layer takes a 3d input, but the example on the documentation page uses a 28x28x28x3 input - 4 dimensions.
So I'm not really sure what to do here, and hte documentation seems to be self-contradicting.
Does anyone know how I might go about correctly using convolution3dLayers? And if these are not the appropriate method by which to thin layers in the third dimension while preserving some of the data from them, what other methods might I use?
Srivardhan Gadila 2021-10-7
In the 16x16x480 data you have mentioned, 480 corresponds to the channel dimension and hence the input is considered like a 2d image input with 480 channels where 16 and 16 to be considered as height and width of the image (just like it is mentioned in the error itself 'relu_8_16' (size 16(S) x 16(S) x 480(C) x 1(B)). Refer to the documentation dlarray data formats to know more about different dimensions Deep Learning Toolbox supports (note that I have pointed this doc page of deep learning with custom training loops, only to provide information related to different data formats in Deep Learning Toolbox).
Same is the case with 28x28x28x3 data, where 28,28,28 should be considered as height, width, depth and 3 should be considered as channel dimension.
For the convolution3dLayer to work, the input should be a 3d input (input layer being image3dInputLayer), which means that the input should be a 4d array with last dimension corresponding to channel dimension i.e., 16(S)x16(S)x480(S)x1(C) or 16(S)x16(S)x1(S)x480(C).
In this case as you want to reduce the 16(S)x16(S)x480(C) data to 16(S)x16(S)x1(C), you will have to use the convolution2dLayer which can be defined similar to: (update the layer arguments according to your use case)
convolution2dLayer([a b],1,'Stride',[c d],'Padding','same')