Closed loop with LSTM for time series

Question

massimo giannini 2024-8-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2145449-closed-loop-with-lstm-for-time-series

编辑： Umar 2024-8-17

data_question.mat

Dear All

I am in troubling trying to perform a multi-ahead (closed loop) forecasting for a time series. I use Matlab2024 and the "old" command predictAndUpdate does not work with dlnetwork objects. I saw all the possible documentation, theoretically I can manage the question but practically not!. I sum up the excercise. Single univariate time serie (stock price) in LSTM net. No problem with trainnet and predict using historical training and test data. Now I'd like to go out of test data. This is the code

X = (XTest);
T = YTest;
%net2 is a dlnetwork object
offset = length(X);
[Z,state] = predict(net2,X(1:offset)); %predict and update state using test data
net2.State = state; 
% five-steps ahead
numPredictionTimeSteps = 5;
Y = (zeros(numPredictionTimeSteps));
Y(1,:) = Z(end); %use the last forecast as starting point for the loop
for t = 2:numPredictionTimeSteps
    [Y(:,t),state] = predict(net2,Y(:,t-1));
    net2.State = state;
end
%I got:
Error using extractState (line 41)
    If the hidden state argument is a matrix, then its number of observations must match the number of observations of the
    input data.
    

I found a similar question on the web for the GRU net. If I restart the net before the loop (as suggested) it works but the forecast is very poor, so I wonder whether the restart can affect such a result. Is there an alternative to reset? I attach the data and the net.

thanks in advance

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Umar 2024-8-15

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2145449-closed-loop-with-lstm-for-time-series#answer_1499309

Hi @ massimo giannini,

In LSTM networks, maintaining the correct state dimensions is crucial when making successive predictions. When you use predict, it expects the input shape to align with what it was trained on. Instead of resetting the network at each prediction step, pass the correct state dimensions into your predictions consistently. The typical approach is to initialize the state from your last prediction and update it iteratively. Here is an adjusted version of your loop that maintains state consistency without resetting:

   X = (XTest);

   T = YTest;

   offset = length(X);

   [Z,state] = predict(net2,X(1:offset)); % Initial prediction

   net2.State = state;

   % Prepare for multi-step ahead forecasting

   numPredictionTimeSteps = 5;

   % Adjust Y's size based on Z

   Y = zeros(size(Z, 1), numPredictionTimeSteps);

   Y(:, 1) = Z(end); % Use last forecast as starting point

   for t = 2:numPredictionTimeSteps

       % Predict using previous output

       [Y(:, t), state] = predict(net2, Y(:, t-1));

       net2.State = state; % Update the network state

end

Also, if your LSTM expects a specific input size or format (e.g., a column vector vs. a matrix), ensure that `Y(:, t-1)` matches those expectations. Based on observation of your code, you may want to explore other forecasting strategies such as using ensemble methods or combining LSTM outputs with other models to improve robustness in predictions. Hope this helps. Please let me know if you have any further questions.

5 个评论
显示 3更早的评论隐藏 3更早的评论

Umar 2024-8-16

Hi @massimo giannini,

To address your first query regarding, “About improving robustness, any suggestions? “

I would suggest using dropout layers within your LSTM architecture which will prevent overfitting. Also, it helps the model generalize better to unseen data. Implementing k-fold cross-validation for time series data which will help to make sure that your model is not just fitted well to one particular split of the data but generalizes across different datasets. I will also add ensemble methods by combining predictions from multiple models to reduce variance and improve accuracy. For example, you could average the predictions from several LSTM models trained on different subsets of your data or with varying hyperparameters. Here’s a simple code snippet demonstrating an ensemble approach:

% Example of ensemble averaging from two LSTM models

pred1 = predict(net1, XTest);

pred2 = predict(net2, XTest);

finalPrediction = (pred1 + pred2) / 2; % Average of predictions

Now let me address your query about,”When I run the forecasting with the changes suggested by you, it work perfectly but I obtain (for each step) a vector of forecasting instead of a scalar. I do understand that for the iterative process I must obtain a vector for each t to feed the net for next t. But for each vector of prediction, which value must I choose? Is reasonable the mean for each vector as representative of the final prediction at step t?”

When working with LSTM networks that output vectors instead of scalars at each prediction step, it is essential to determine how to summarize these vectors into a single representative value for each timestep. Using the mean is a reasonable approach if the predicted values have a similar scale and relevance. However, depending on the context, you might also consider other statistical measures like median or mode if they better capture the central tendency of your predictions. Here’s how you can modify your code snippet to calculate the mean for each prediction vector:

% Assuming Y is already defined as in your previous code

Y = zeros(size(Z, 1), numPredictionTimeSteps);

Y(:, 1) = Z(end); % Use last forecast as starting point

for t = 2:numPredictionTimeSteps

     [Y(:, t), state] = predict(net2, Y(:, t-1));

      net2.State = state; % Update the network state

end

% Calculate mean for each prediction vector

finalPredictions = mean(Y, 1); % Mean across rows for each time step

For more information and guidance on functions such as mean,median and mode, please refer to

mean

median

mode

Please note that when summarizing vector outputs into scalars, be mindful of any domain-specific implications that may arise from this choice. For instance, considering a scenario that if your application requires preserving variance or capturing outliers, using only the mean may not suffice. Moreover, if you continue exploring forecasting methods beyond LSTMs, I would consider integrating traditional statistical methods like ARIMA (AutoRegressive Integrated Moving Average) with machine learning techniques to potentially capture different aspects of your data's underlying patterns.

Feel free to reach out if you have further questions or need more clarification on specific points.

massimo giannini 2024-8-16

Hi Umar many thanks! I am on "old" econometrician and I am studying deep learning now. My dissertation was on ARIMA and GARCH (35 years ago). But now I'd like to investigate "new" methods. As said, I am moving from R to matlab but I found explanations in mathworks poorly useful; they have a lot of examples but technical details are missing. As an example: if I use OptionMode=sequence in LSTM, I was able to obtain one single prediction at each out of sample step but if I use last (as I did in the code I sent you) I obtain a vector, following your help. This is not clear to me. As I want a sequence-to-one net, "last" should be the right choice.

Have you a good technical textbooks to suggest?

many thanks

Umar 2024-8-17

编辑：Umar 2024-8-17

Hi @massimo giannini,

Regarding your question about OptionModein LSTM configurations, let me clarify when you set OptionMode=sequence, the LSTM network processes the entire input sequence and returns a prediction for each time step. This is useful for tasks where you need a prediction at every step, such as in time series forecasting where you want to track the evolution of predictions over time. Conversely, when you use OptionMode=last, the network only returns the prediction corresponding to the last time step of the input sequence. This is particularly useful for sequence-to-one tasks, where you want a single output for a given input sequence.

Now, let’s focus on your comment regarding, “. As I want a sequence-to-one net, "last" should be the right choice.”

I will break down my provided code snippet to clarify how to implement a sequence-to-one network using the last mode.

X = (XTest); % Input test data

T = YTest;   % Target test data

offset = length(X); % Determine the length of the input data

% Initial prediction using the entire input sequence

[Z,state] = predict(net2,X(1:offset)); % Predict and update state

net2.State = state; % Update the network state for future predictions

% Prepare for multi-step ahead forecasting

numPredictionTimeSteps = 5; % Number of future time steps to predict

Y = zeros(size(Z, 1), numPredictionTimeSteps); % Initialize output matrix

Y(:, 1) = Z(end); % Use the last forecast as the starting point

for t = 2:numPredictionTimeSteps

    % Predict the next time step using the last output

    [Y(:, t), state] = predict(net2, Y(:, t-1));

    net2.State = state; % Update the network state

end

So, in the code, input data X and target data T are defined. The offset variable captures the length of the input data, which is crucial for determining how much data to feed into the network for the initial prediction. Afterwards, the first prediction is made using the entire input sequence. The output Z contains the predictions for each time step, and the state of the network is updated accordingly. Also, the output matrix Y is initialized to store predictions for the specified number of future time steps. The first column of Y is set to the last prediction from Z, which serves as the starting point for subsequent predictions. The loop iterates to predict future time steps and each iteration uses the last predicted value as input for the next prediction, effectively chaining the predictions together. Finally, the network state is updated after each prediction to maintain continuity.

I truly understand about your transition approach from traditional econometric models like ARIMA and GARCH to deep learning techniques such as LSTM which can indeed be challenging, especially when moving between programming environments like R and MATLAB.

Finally, addressing your question regarding, “Have you a good technical textbooks to suggest?”

For a deeper understanding of LSTM and other deep learning techniques in MATLAB, I recommend the following textbooks:

*Introduction to Machine Learning with Python: A Guide for Data Scientists Book by Andreas C. Muller and Sarah Guido

*LSTM Networks : Exploring the Evolution and Impact of Long Short-Term Memory Networks in Machine Learning Kindle Edition by Henri van Maarseveen

*Deep Learning: Recurrent Neural Networks in Python: LSTM, GRU, and more RNN machine learning architectures in Python and Theano (Machine Learning in Python) Kindle Edition by LazyProgrammer

Hope, I have answered all your questions.

请先登录，再进行评论。

Closed loop with LSTM for time series

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

5 个评论
显示 3更早的评论隐藏 3更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Closed loop with LSTM for time series

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

5 个评论 显示 3更早的评论隐藏 3更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论