Projection of LSTM layer vs GRU layer

Question

Silvia 2024-5-28

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2123176-projection-of-lstm-layer-vs-gru-layer

评论： Silvia 2024-6-10

I am training two RNNs, one with a LSTM layer and the other one with a GRU layer. The two architectures are the following:

numFeatures = 1;        
numHiddenUnits = 32;
layersLSTM = [
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits, OutputMode="sequence")
    fullyConnectedLayer(numFeatures)
    ];
layersGRU = [
    sequenceInputLayer(numFeatures)
    gruLayer(numHiddenUnits, OutputMode="sequence")
    fullyConnectedLayer(numFeatures)
    ];

I am using the examples for the Projection at these links: https://it.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.lstmprojectedlayer.html and https://it.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.gruprojectedlayer.html

Using the GRU architecture and training the projected model, the Validation RMSE and Loss do not follow the Training RMSE and Loss as shown in the image below:

It's the first time that this happens. For the LSTM NN I've never had this problem (both for the architecture with LSTM layer and the one with LSTM projected layer), and also training the GRU NN model without projection I didn't have this problem. The validation could follow the metrics properly. What could this problem be due to?

I have also a second question:

Following the two examples in Matlab I set the parameters of outputProjectorSize and inputProjectorSize to:

75% of the number of Hidden Units and 25% of the Input size respectively for LSTM
25% of the number of Hidden Units and 75% of the Input size respectively for GRU

So, for the GRU it's the opposite. Is there a reason behind this choise?

Thank you in advance!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Maksym Tymchenko 2024-6-3

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2123176-projection-of-lstm-layer-vs-gru-layer#answer_1466876

Hi @Silvia,

I am glad to see that you are using our new projection features.

I'll start by answering the second question.

From what I see, both examples are using the exact same definition for OutputProjectorSize and InputProjectorSize in the section "Compare Network Projection Sizes":

An output projector size of 25% of the number of hidden units.
An input projector size of 75% of the input size.

These are reasonable parameter sizes to choose because they result in the lstmProjectedLayer having fewer learnable parameters compared to an lstmLayer with the same number of hidden units. Note that it is possible to choose values that will result in a projected layer being larger than the original layer. To avoid this, use the function compressNetworkUsingProjection which will determine these parameters sizes automatically based on the desired amount of compression specified.

Alternatively, if you want to create the projected layers from scratch, follow the Tips in the description of the the OutputProjectorSize and InputProjectorSize parameters. These say that, to ensure that the projected layer requires fewer learnable parameters than the corresponding non-projected layer:

For an lstmProjectedLayer: set the OutputProjectorSize property to a value less than 4*NumHiddenUnits/5, and set the InputProjectorSize property to a value less than 4*NumHiddenUnits*inputSize/(4*NumHiddenUnits+inputSize)
For a gruProjectedLayer: set the OutputProjectorSize property to a value less than 3*NumHiddenUnits/4, and set the InputProjectorSize property to a value less than 3*NumHiddenUnits*inputSize/(3*NumHiddenUnits+inputSize)

These formulas can be derived by expressing the total number of learnable parameters as a function of the number of hidden units and the input size. For more information, see the algorithms section of the pages lstmProjectedLayer and gruProjectedLayer.

Regarding your first question, I would need the full reproduction steps, including the script and dataset used, in order to investigate what the issue is. Please feel free to share these as an attachment to this post. Or alternatively, you can open a technical support request with the reproduction steps.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Silvia 2024-6-10

Hello @Maksym Tymchenko,

Thank you for the detailed explanations and the interesting insight into the compressNetworkUsingProjection function!

Unfortunately, as far as the codes and datasets are concerned, I cannot share anything for reasons of data privacy.

But thank you again for your help!

Silvia

请先登录，再进行评论。

Projection of LSTM layer vs GRU layer

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Projection of LSTM layer vs GRU layer

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论