"Deep Learning Toolbox" products - NARX versus LSTM NN

Question

Giuseppe Menga 2022-3-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1663034-deep-learning-toolbox-products-narx-versus-lstm-nn

评论： Giuseppe Menga 2023-12-10

I'm using NN to predict in real time, one step ahead for control, from multi input time series, multi output time series of a system.

I recorded input and output from the system in three sets of esperiments,

the first for training the others, recorded in slightly different operating conditions of the system, for testing and validation.

Originally I was using NARX nets, were the three mutually exclusive sets are incorporarted in the optimization algorithm.

In the deep learning toolbox of the Narx net the training using the first set was taking into account someway the other two sets (also the performance indices of errors inside them was reduced).

This technique showed that the net was robust enough to explain the behaviour of the system even in slightly different operating conditions.

I tried also to merge in the train all the three sets. I recognized the difference in the longer time that the algorithm took to reach the convergence.

In this case the owerall reached performance was a slightly better than in the previous case, but not much.

For the same application, I was using also the LSTM network sets for sequence-to-sequence regression.

This net also accepts training and validation sets. In the validation set I included both previous test and validation.

However the training on the training set was not reducing the error in the validation set. To obtain a satisfactory result all around, with same dimension of the net, I had to merge in the training set also the validation.

I'm not able to enter into the optimization algorithms of training in the deep learning toolboxes of the two kinds of nets.

Someone can explain me those different behaviours?

Giuseppe

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Kartik Saxena 2023-12-8

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1663034-deep-learning-toolbox-products-narx-versus-lstm-nn#answer_1367819

Hi,

The different behaviors you're observing between the NARX (Nonlinear AutoRegressive with eXogenous inputs) network and the LSTM (Long Short-Term Memory) network during training and validation on your dataset can be attributed to several factors, including the nature of the algorithms, data preprocessing, network architectures, and how the training process is handled in each case.

Let's break down the potential reasons for these differences:

NARX Network:

1. Incorporation of Feedback: NARX networks are a type of recurrent neural network that includes feedback connections from output to input. This architecture is particularly well-suited for time-series prediction tasks where past outputs are used to predict future outputs. When you train a NARX network, it learns not only from the input-output pairs but also from the structure of the time series itself.

2. Use of Validation and Test Sets: In the context of NARX networks, validation and test sets are often used for early stopping and for assessing the generalization performance of the network. The fact that training on the first set also improved performance on the other two sets suggests that the network was able to learn generalizable features that are robust to slight variations in operating conditions.

3. Performance: The slight improvement in performance when merging all three sets into the training set could be due to the network having access to more diverse data, which helps it to better capture the underlying system dynamics.

LSTM Network:

1. Sequence Learning: LSTM networks are designed to learn long-term dependencies in sequence data. They are powerful for sequence-to-sequence regression tasks but may require careful tuning and sufficient data to generalize well.

2. Overfitting: If the LSTM network is not reducing error on the validation set during training, it could be overfitting to the training set. This means the network is learning patterns that are too specific to the training data and not generalizing to unseen data.

3. Data Preprocessing: LSTM networks can be sensitive to the way data is preprocessed. Normalization, scaling, and the way sequences are batched can significantly affect the network's ability to learn.

4. Network Capacity: The size and complexity of the LSTM network can also play a role. A network that is too small may not have enough capacity to learn the complexities of the data, while a network that is too large may overfit.

5. Validation Strategy: By including both the test and validation sets in the validation phase for LSTM, you are effectively evaluating the network's performance on a larger set of unseen data. If the network is not trained on similar conditions, its performance on the validation set may not improve.

General Considerations:

Data Variability: If the operating conditions of the system change significantly between datasets, the network may struggle to learn a model that generalizes across all conditions without seeing examples from each during training.
Random Initialization: Neural networks are initialized with random weights, which can lead to different training outcomes in different runs. It's important to ensure that the comparison between networks is fair by using the same initialization seed or averaging results over multiple runs.
Hyperparameter Tuning: Both types of networks require careful hyperparameter tuning, including learning rate, number of layers, number of units per layer, and regularization techniques. The optimal settings for a NARX network may differ from those for an LSTM network.
Training Algorithms: The optimization algorithms used for training neural networks (e.g., SGD, Adam, RMSprop) have different characteristics and hyperparameters. The choice of optimizer and its settings can affect the training process and outcomes.

To better understand why one network type is outperforming the other, you would need to conduct a systematic analysis of the factors mentioned above. This might include experiments with different network architectures, hyperparameter settings, data preprocessing techniques, and training strategies.

Lastly, it's worth noting that the MATLAB Deep Learning Toolbox provides functions and options to customize and control the training process, including setting aside a portion of the training data for validation (`validationData` option in `trainingOptions`), using different optimization algorithms (`'sgdm'`, `'adam'`, etc.), and setting callbacks for custom behavior during training (e.g., `OutputFcn`). Understanding and utilizing these options can help you achieve better training outcomes.

Refer to the following MathWorks documentation links for detailed information about the use of NARX , LSTM and Deep Learning Toolbox: