Does LSTM training reset after resuming learning?

Question

Eugen Fekete 2025-2-23

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2174414-does-lstm-training-reset-after-resuming-learning

评论： Eugen Fekete 2025-2-25

I have a database called processed_data, which contains cells structured like this:

980999999999767	0.945912306864893	1
46300000000338	0.926617136227153	1
511999999995169	0.868790509137634	2
00600000000122	0.978074194186882	1
995999999999185	0.884817478795566	2
12400000000343	0.740093883803231	2
35399999999936	0.418494628137842	2
653999999994994	0.399199457500103	2
00600000000122	0.438938088213894	2
999000000003434	0.566427539286267	2

The first column represents the seconds that have passed since the previous row, the second column is the value at that given time (normalized), and the third column contains the categorical values 1 and 2. The first two columns are the predictors, and the third column is the target.

Each cell represents one day and I have around 215 days worth of data, each with a varying number of observations. The goal is to create an LSTM algorithm that, based on the predictors, can predict whether the value will increase (2) or decrease (1) in the future. During training, I'm keeping each day separate by stopping the learning when the last batch for a given day is reached, then loading the data for the next day and resuming the training.

The problem is that when training resumes after loading the next day's data, it's as if the network is completely reset and starts learning from scratch. It repeatedly produces the same accuracy results (excluding the first iteration) with only slight changes in loss values. The accuracy in the remaining iterations remains unchanged, always producing the same values, as if the network is not learning at all. Here is an example output for day 1:

1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.69781              40.625
           50        1       00:00:00        0.001         0.65881              64.844
          100        1       00:00:00        0.001         0.70176              50.781
          117        1       00:00:00        0.001         0.63057              69.531
Training stopped: Max epochs completed
...
...
...
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.70017              41.406
           50        1       00:00:00        0.001         0.65913              64.844
          100        1       00:00:00        0.001          0.6985              50.781
          117        1       00:00:00        0.001         0.62994              69.531
Training stopped: Max epochs completed
...
...
...
1. day
    Iteration    Epoch    TimeElapsed    LearnRate    TrainingLoss    TrainingAccuracy
    _________    _____    ___________    _________    ____________    ________________
            1        1       00:00:00        0.001         0.69753              42.188
           50        1       00:00:00        0.001          0.6619              64.844
          100        1       00:00:00        0.001         0.70356              50.781
          117        1       00:00:00        0.001          0.6291              69.531
Training stopped: Max epochs completed
...
...
...

Here is my code snippet:

%% Define training options 
train_opts = trainingOptions( ...
    "adam", ...
    InitialLearnRate = 0.001, ...
    MiniBatchSize= 128, ...
    Plots = "none", ...
    Verbose = true, ...
    MaxEpochs = 1, ...
    Shuffle = "never", ...
    Metrics = "accuracy" ...
    );
%% Define network.
net = dlnetwork;
temp_net = [
    sequenceInputLayer(2,"Name","input")
    lstmLayer(256,"Name","lstm","OutputMode","last")
    dropoutLayer(0.5,"Name","dropout")
    fullyConnectedLayer(2,"Name","output")
    softmaxLayer];
net = addLayers(net, temp_net);
net = initialize(net);
% clean up helper variable
clear temp_net;
%% Load the data for each day and train the network.
num_of_epochs = 30;
train_data_length = round(length(processed_data) * 0.9);
train_data = processed_data(1: train_data_length);
for epoch = 1:num_of_epochs
    for day = 1:train_data_length
        if train_opts.Verbose
            disp(day + ". day")
        end
        train_X = processed_data{day}(:, 1:2);
        train_X = dlarray(train_X, "BCT");
        train_Y = categorical(processed_data{day}(:, 3));
    
        net = trainnet(train_X, train_Y, net, "crossentropy", train_opts);
    end
end

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Karan Singh 2025-2-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2174414-does-lstm-training-reset-after-resuming-learning#answer_1560553

Hi @Eugen Fekete

I think when you call your training function (in this case, your custom "trainnet" with "MaxEpochs=1") for each day’s data, MATLAB’s training routine reinitializes the internal training state. This means that while the network’s learned weights are preserved between calls, things like the optimizer’s momentum or Adam’s moment estimates and, by default, the LSTM’s hidden and cell states are reset at the start of each new training session.

So, to answer your question:

The LSTM’s learned weights are carried over, but the training “state” (including optimizer states and the internal sequence states) is reset each time you resume training.

This is expected behavior when using MATLAB’s built-in training routines in this manner. If you want to maintain the optimizer state, you would need to implement a custom training loop that preserves those states across batches.

Karan

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Eugen Fekete 2025-2-25

Good to know! I'll try my best to create a custom training loop then! This might be a dumb question, but could the training state being reset at each trainnet call cause the network to not learn at all or only marginal learning?

请先登录，再进行评论。

Does LSTM training reset after resuming learning?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Does LSTM training reset after resuming learning?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论