how to organize input dimensions for LSTM classification

Question

Fan 2024-8-15

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2145379-how-to-organize-input-dimensions-for-lstm-classification

评论： Fan 2024-8-21

Hi guys,

I'm trying to train a lstm using sequential data to predict classes, and I'm a little confused by the format of input data and labels.

For the sake of simplicity, I'll use an example to mimic my situation.

let's say I'm trying to use temperature data to predict 3 cities: A, B, and C.

Within each city, i have temperature readings from 10 therometers over 2 seconds at a sample frequency of 100 hz.

So far, at each observation, I have a 200 by 10 matrix (time point by therometer).

temperature_matrix = randi(40, 200, 10) % pseudodata

We collected the temperature data 40 times throughout the day at each city, and this will give us 120 observations (3 cities * 40). Within each observation, I have a 200 by 10 matrix.

As for my input format, I now have a 120 by 1 cell array, and again within each cell array is a 200 by 10 matrix.

temperature_input = cell(120,1)
for ii = 1:length(temperature_input)
    temperature_input{ii} = randi(40, 200, 10)
end
labels = [repmat("city A", 40,1); repmat("city B", 40,1); repmat("city C", 40,1)]

Per my undstanding, if I were to have a time step of 10, i should make a sliding window with a size of 5, and move it down the time dimenssion at a moving step of 1. That is to say, for each 200 by 10 temperature_matrix, I now slice it into 196 2D arrays, where each array is 5 by 10 (window size by therometer).

My question is how this sliding window plays a part in the input format? the sliding window create the fourth dimension in my example. The other three dimension is observation, time, and therometer. I think my overall structure is still a 120 by 1 cell array, but the dimenssions within each entry, I dont know how to organize them.

Also, out of curiosity, will it mess up the structure i transpose the time point by therometer matrice? I'm only asking between I've seen examples on the sequencce either in row or column.

Best,

FY

2 个评论
显示无隐藏无

Jaimin 2024-8-16

Hi @Fan,

Is it possible to treat each thermometer reading as an individual sequence and categorize them as A, B, or C? If this approach is feasible, then the size of your training data would change from 120x200x10 (3D) to 1200x200 (2D).

Fan 2024-8-16

Hi @Jaimin,

Thanks for commenting!

It is certainly possible to reshape the matrix that way, and I believe the label should also be organized into a 1200 by 1 vector accordingly.

However, in this new 1200 by 200 2D array, it still doesn't show the timestep for LSTM. For example, a timestep of 5 mean in the input array there should be an dimenssion of a size 5. So the question remains.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jaimin 2024-8-16

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2145379-how-to-organize-input-dimensions-for-lstm-classification#answer_1499914

编辑：Jaimin 2024-8-16

在 MATLAB Online 中打开

Hi @Fan,

As outlined in the issue statement, difficulties were encountered with the LSTM-based deep learning model training using 4D data but as discussed in the comments, it is possible to represent 3D data in a 2D format as well.

The data is now in a 2D format of 1200x200. Each cell in the 1200-cell matrix contains a sequence of 200 elements. By applying a sliding window with a window size of 5, the sequence transforms into 196x5. This results in a training data size of 1200x196x5.

Using this training data, a sample LSTM-based deep learning model has been created according to the provided requirements.

% Sample size
numSamples = 1200;
sequenceLength = 200;
windowSize = 5;
% Dummy data X (1200 cells, each with a sequence of 200 elements)
X = cell(numSamples, 1);
for i = 1:numSamples
    X{i} = randn(sequenceLength, 1);  % Replace with your real data
end
% Dummy labels Y (1200x1 vector, each element is either 'A', 'B', or 'C')
classes = {'A', 'B', 'C'};
Y = categorical(randi([1, 3], numSamples, 1), [1, 2, 3], classes);
% Create the sliding windows for each sequence
X_sliding = cell(numSamples, 1);
for i = 1:numSamples
    seq = X{i};
    numWindows = sequenceLength - windowSize + 1;
    windows = zeros(numWindows, windowSize);
    
    for j = 1:numWindows
        windows(j, :) = seq(j:j+windowSize-1);
    end
    
    X_sliding{i} = windows';  % Transpose to [windowSize x numWindows] for LSTM
end
% Define the LSTM model architecture
inputSize = windowSize;   % Each window has 5 steps
numHiddenUnits = 100;     % Number of hidden units in LSTM
numClasses = 3;           % A, B, C
layers = [ ...
    sequenceInputLayer(inputSize)
    lstmLayer(numHiddenUnits, 'OutputMode', 'last')
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];
% Specify training options
options = trainingOptions('adam', ...
    'MaxEpochs', 50, ...
    'MiniBatchSize', 16, ...
    'Shuffle', 'every-epoch', ...
    'Verbose', false, ...
    'Plots', 'training-progress');
% Train the LSTM model
net = trainNetwork(X_sliding, Y, layers, options);
% Predict on new data (example)
YPred = classify(net, X_sliding);
% Compute accuracy
accuracy = sum(YPred == Y) / numSamples;
disp("Accuracy: " + accuracy);

For more information on "lstm", please refer to the following documentation:

https://www.mathworks.com/help/deeplearning/ug/long-short-term-memory-networks.html

For more on infromation on "Sequence Classification Using Deep Learning" , please refer to the following documentation:

https://www.mathworks.com/help/deeplearning/ug/classify-sequence-data-using-lstm-networks.html

I hope this is helpful.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Fan 2024-8-21

@Jaimin Thanks!

Ill try that out!

请先登录，再进行评论。

how to organize input dimensions for LSTM classification

2 个评论
显示无隐藏无

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

how to organize input dimensions for LSTM classification

2 个评论 显示 无隐藏 无

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

2 个评论
显示无隐藏无

1 个评论
显示 -1更早的评论隐藏 -1更早的评论