Why does AlexNet have non-scalar values for NumChannels and NumFilters in some convolutional layers?

7 次查看(过去 30 天)
Take the layers 'conv1' and 'conv2' for example. 'conv1' has 3 channels and 96 filters; that's fine. It should follow that 'conv2' has 96 channels, but instead what I find is the following:
  • conv2.NumChannels = [48 48]
  • conv2.Weights has the following form: [5×5×48×256 single]
The total of the elements in conv2.NumChannels is the required 96, but why is this split into 2 48s?
The conv2.Weights property suggests that there are only 48 channels, not the required 96. Are half the filters in 'conv1' redundant as a result?
It is impossible to construct a Convolution2DLayer with a 2-element NumChannels, so how did this happen?
From here the confusion continues because conv2.NumFilters = [128 128]. The total is 256, which is the correct number of filters, and is consistent with the conv.2.Weights property written above. But again, why is this split across 2 elements? And how did this happen given that it is impossible to construct a Convolution2DLayer with a non-scaler NumFilters property?
If anyone can help me to overcome this confusion, I would be very grateful

采纳的回答

Chaitral Date
Chaitral Date 2017-6-27
In AlexNet, certain convolutional layers use “filter groups”. In these layers, the filters are split into two groups. The input to a layer with “filter groups” is split into two sections along the channel dimension, and then each “filter group” is applied to a different section. The two resulting sections are then concatenated together to produce the output. This may seem convoluted, but this was done in the original implementation of AlexNet to make it easier to split the network between two GPUs for training.
So for the second convolutional layer in AlexNet, the weights are split into two groups of 128 filters. Each filter has 48 channels. The input to the layer has 96 channels, but it gets split into two sections with 48 channels each. Each group of filters is applied to a different section, to produce two outputs with 128 channels each. These two outputs are then concatenated together to give a final output with 256 channels.

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Deep Learning Toolbox 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by