Does LSTM training reset after resuming learning?
5 次查看(过去 30 天)
显示 更早的评论
I have a database called processed_data, which contains cells structured like this:
0.980999999999767 0.945912306864893 1
1.46300000000338 0.926617136227153 1
0.511999999995169 0.868790509137634 2
1.00600000000122 0.978074194186882 1
0.995999999999185 0.884817478795566 2
1.12400000000343 0.740093883803231 2
1.35399999999936 0.418494628137842 2
0.653999999994994 0.399199457500103 2
1.00600000000122 0.438938088213894 2
0.999000000003434 0.566427539286267 2
The first column represents the seconds that have passed since the previous row, the second column is the value at that given time (normalized), and the third column contains the categorical values 1 and 2. The first two columns are the predictors, and the third column is the target.
Each cell represents one day and I have around 215 days worth of data, each with a varying number of observations. The goal is to create an LSTM algorithm that, based on the predictors, can predict whether the value will increase (2) or decrease (1) in the future. During training, I'm keeping each day separate by stopping the learning when the last batch for a given day is reached, then loading the data for the next day and resuming the training.
The problem is that when training resumes after loading the next day's data, it's as if the network is completely reset and starts learning from scratch. It repeatedly produces the same accuracy results (excluding the first iteration) with only slight changes in loss values. The accuracy in the remaining iterations remains unchanged, always producing the same values, as if the network is not learning at all. Here is an example output for day 1:
1. day
Iteration Epoch TimeElapsed LearnRate TrainingLoss TrainingAccuracy
_________ _____ ___________ _________ ____________ ________________
1 1 00:00:00 0.001 0.69781 40.625
50 1 00:00:00 0.001 0.65881 64.844
100 1 00:00:00 0.001 0.70176 50.781
117 1 00:00:00 0.001 0.63057 69.531
Training stopped: Max epochs completed
...
...
...
1. day
Iteration Epoch TimeElapsed LearnRate TrainingLoss TrainingAccuracy
_________ _____ ___________ _________ ____________ ________________
1 1 00:00:00 0.001 0.70017 41.406
50 1 00:00:00 0.001 0.65913 64.844
100 1 00:00:00 0.001 0.6985 50.781
117 1 00:00:00 0.001 0.62994 69.531
Training stopped: Max epochs completed
...
...
...
1. day
Iteration Epoch TimeElapsed LearnRate TrainingLoss TrainingAccuracy
_________ _____ ___________ _________ ____________ ________________
1 1 00:00:00 0.001 0.69753 42.188
50 1 00:00:00 0.001 0.6619 64.844
100 1 00:00:00 0.001 0.70356 50.781
117 1 00:00:00 0.001 0.6291 69.531
Training stopped: Max epochs completed
...
...
...
Here is my code snippet:
%% Define training options
train_opts = trainingOptions( ...
"adam", ...
InitialLearnRate = 0.001, ...
MiniBatchSize= 128, ...
Plots = "none", ...
Verbose = true, ...
MaxEpochs = 1, ...
Shuffle = "never", ...
Metrics = "accuracy" ...
);
%% Define network.
net = dlnetwork;
temp_net = [
sequenceInputLayer(2,"Name","input")
lstmLayer(256,"Name","lstm","OutputMode","last")
dropoutLayer(0.5,"Name","dropout")
fullyConnectedLayer(2,"Name","output")
softmaxLayer];
net = addLayers(net, temp_net);
net = initialize(net);
% clean up helper variable
clear temp_net;
%% Load the data for each day and train the network.
num_of_epochs = 30;
train_data_length = round(length(processed_data) * 0.9);
train_data = processed_data(1: train_data_length);
for epoch = 1:num_of_epochs
for day = 1:train_data_length
if train_opts.Verbose
disp(day + ". day")
end
train_X = processed_data{day}(:, 1:2);
train_X = dlarray(train_X, "BCT");
train_Y = categorical(processed_data{day}(:, 3));
net = trainnet(train_X, train_Y, net, "crossentropy", train_opts);
end
end
0 个评论
回答(1 个)
Karan Singh
2025-2-25
I think when you call your training function (in this case, your custom "trainnet" with "MaxEpochs=1") for each day’s data, MATLAB’s training routine reinitializes the internal training state. This means that while the network’s learned weights are preserved between calls, things like the optimizer’s momentum or Adam’s moment estimates and, by default, the LSTM’s hidden and cell states are reset at the start of each new training session.
So, to answer your question:
The LSTM’s learned weights are carried over, but the training “state” (including optimizer states and the internal sequence states) is reset each time you resume training.
This is expected behavior when using MATLAB’s built-in training routines in this manner. If you want to maintain the optimizer state, you would need to implement a custom training loop that preserves those states across batches.
Karan
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Image Data Workflows 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!