Is LSTM and fully connected networks changing channels or neurons?

2 次查看(过去 30 天)
When I was building the network, I was surprised to find out why LSTM and fully connected networks change the number of channels instead of neurons. My input is a one-dimensional signal (1024 sampling points), which is 1 (C) when analyzing the network × 1 (B) × 1024 (T), when passing through BiLSTM, the channel changes while the others remain unchanged, which is strange and inconsistent with theory. Additionally, how can MATLAB build an SE attention module? The multiplication layer has no broadcasting function and can only be element by element, so 1 × one × How C is related to H × W × What about C multiplied by? Thank you all for your guidance!

采纳的回答

Ben
Ben 2023-9-20
We use "channels" or C to refer to the feature dimension - in the case of LSTM, BiLSTM, GRU I think of the operation as a loop over a sequence of vector inputs - each vector has size C. In practice we have a multi-dimensional array with size CxBxT - the operation loops over the sequence T dimension, vectorizes over the batch B dimension, and uses the C dimension in the computation.
Perhaps the confusion is that often LSTM-s have their output feedback as their input, but this is not necessary, so you can have an LSTM with inputs as vectors of size C1 and outputs of size C2. You can find a description of the algorithm in the "Algorithms" section of the doc page: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmlayer.html or you can see that the input state dimension and output/hidden state dimensions can be different in the algorithm described here https://en.wikipedia.org/wiki/Long_short-term_memory
Could you clarify what an SE attention module is?
We can implement attention mechanisms that require batch-matrix multiplies by using pagemtimes: https://www.mathworks.com/help/matlab/ref/pagemtimes.html - this is supported as a dlarray method, so it can be used in autodiff workflows such as custom training loops, and implementing custom layers. Alternatively if you need standard self attention you can use selfAttentionLayer: https://www.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.selfattentionlayer.html

更多回答(1 个)

秀新
秀新 2023-9-21
Firstly, thank you very much for your answer. Below, I will explain the SE attention module. The SE attention module was published in 2018 with the title "Squeeze and Extraction Networks". When the input is a sequence, according to the network structure of this article, MATLAB cannot build it. I just tried changing global pooling to average pooling, which can achieve similar results.

类别

Help CenterFile Exchange 中查找有关 Image Data Workflows 的更多信息

标签

产品


版本

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by