Why does layerNormalizationLayer in Deep Learning Toolbox include T dimension into the batch?

1 次查看（过去 30 天）

John Smith 2023-3-13

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1927745-why-does-layernormalizationlayer-in-deep-learning-toolbox-include-t-dimension-into-the-batch

回答： John Smith 2023-3-24

Hello,

While implementing a ViT transformer in Matlab, I found at that the layerNormalizationLayer does include the T dimension in the statistics calculated for each sample in the batch. This is problematics when implementing a transformer, since tokens correspond to the T dimension and reference implementations calculate the statistics separately for each token.

Thx

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

采纳的回答

John Smith 2023-3-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1927745-why-does-layernormalizationlayer-in-deep-learning-toolbox-include-t-dimension-into-the-batch#answer_1199924

It seems Mathworks have listened and changed the behavior of layerNormalizationLayer in R2023a.:

https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.layernormalizationlayer.html

Starting in R2023a, by default, the layer normalizes sequence data over the channel and spatial dimensions. In previous versions, the software normalizes over all dimensions except for the batch dimension (the spatial, time, and channel dimensions). Normalization over the channel and spatial dimensions is usually better suited for this type of data. To reproduce the previous behavior, set OperationDimension to "batch-excluded".

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

更多回答（1 个）

Matt J 2023-3-13

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1927745-why-does-layernormalizationlayer-in-deep-learning-toolbox-include-t-dimension-into-the-batch#answer_1191890

Perhaps you can fold your T dimension into the C dimension and use a groupNormalizationLayer instead, with the groups defined so that different T belong to different groups.