Hi,
The different behaviors you're observing between the NARX (Nonlinear AutoRegressive with eXogenous inputs) network and the LSTM (Long Short-Term Memory) network during training and validation on your dataset can be attributed to several factors, including the nature of the algorithms, data preprocessing, network architectures, and how the training process is handled in each case.
Let's break down the potential reasons for these differences:
NARX Network:
1. Incorporation of Feedback: NARX networks are a type of recurrent neural network that includes feedback connections from output to input. This architecture is particularly well-suited for time-series prediction tasks where past outputs are used to predict future outputs. When you train a NARX network, it learns not only from the input-output pairs but also from the structure of the time series itself.
2. Use of Validation and Test Sets: In the context of NARX networks, validation and test sets are often used for early stopping and for assessing the generalization performance of the network. The fact that training on the first set also improved performance on the other two sets suggests that the network was able to learn generalizable features that are robust to slight variations in operating conditions.
3. Performance: The slight improvement in performance when merging all three sets into the training set could be due to the network having access to more diverse data, which helps it to better capture the underlying system dynamics.
LSTM Network:
1. Sequence Learning: LSTM networks are designed to learn long-term dependencies in sequence data. They are powerful for sequence-to-sequence regression tasks but may require careful tuning and sufficient data to generalize well.
2. Overfitting: If the LSTM network is not reducing error on the validation set during training, it could be overfitting to the training set. This means the network is learning patterns that are too specific to the training data and not generalizing to unseen data.
3. Data Preprocessing: LSTM networks can be sensitive to the way data is preprocessed. Normalization, scaling, and the way sequences are batched can significantly affect the network's ability to learn.
4. Network Capacity: The size and complexity of the LSTM network can also play a role. A network that is too small may not have enough capacity to learn the complexities of the data, while a network that is too large may overfit.
5. Validation Strategy: By including both the test and validation sets in the validation phase for LSTM, you are effectively evaluating the network's performance on a larger set of unseen data. If the network is not trained on similar conditions, its performance on the validation set may not improve.
General Considerations:
- Data Variability: If the operating conditions of the system change significantly between datasets, the network may struggle to learn a model that generalizes across all conditions without seeing examples from each during training.
- Random Initialization: Neural networks are initialized with random weights, which can lead to different training outcomes in different runs. It's important to ensure that the comparison between networks is fair by using the same initialization seed or averaging results over multiple runs.
- Hyperparameter Tuning: Both types of networks require careful hyperparameter tuning, including learning rate, number of layers, number of units per layer, and regularization techniques. The optimal settings for a NARX network may differ from those for an LSTM network.
- Training Algorithms: The optimization algorithms used for training neural networks (e.g., SGD, Adam, RMSprop) have different characteristics and hyperparameters. The choice of optimizer and its settings can affect the training process and outcomes.
To better understand why one network type is outperforming the other, you would need to conduct a systematic analysis of the factors mentioned above. This might include experiments with different network architectures, hyperparameter settings, data preprocessing techniques, and training strategies.
Lastly, it's worth noting that the MATLAB Deep Learning Toolbox provides functions and options to customize and control the training process, including setting aside a portion of the training data for validation (`validationData` option in `trainingOptions`), using different optimization algorithms (`'sgdm'`, `'adam'`, etc.), and setting callbacks for custom behavior during training (e.g., `OutputFcn`). Understanding and utilizing these options can help you achieve better training outcomes.
Refer to the following MathWorks documentation links for detailed information about the use of NARX , LSTM and Deep Learning Toolbox:
I hope this resolves your issue.