Low LSTM Accuracy in Speech Recognition

Question

Hamza 2023-10-31

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2041026-low-lstm-accuracy-in-speech-recognition

评论： Christopher McCausland 2023-11-6

Hello everyone, I am applying LSTM to speech emotion recognition. I have performed feature extraction using MFCC, resulting in a matrix of dimensions 60,575 × 39. I subsequently transformed this matrix into a cell array named "AllCellTrain" with dimensions 280 × 1, containing signals of varying sizes, as illustrated in the image below. I then utilized "AllCellTrain" as input for the trainNetwork function, along with the labels YCA, network layers, and training options. However, I encountered a significant issue with accuracy, achieving only around 20%. I'm unsure where I may have made a mistake. Could someone please offer some assistance?

 num_hidden_units = 1024;
layers = [
    sequenceInputLayer(num_features)
    lstmLayer(num_hidden_units, 'OutputMode', 'last')
    fullyConnectedLayer(num_classes)
    softmaxLayer
    classificationLayer];
% Specify the training options
    max_epochs = 36;
    mini_batch_size = 28;
    initial_learning_rate = 0.001;
options = trainingOptions('adam', ...
    'MaxEpochs', max_epochs, ...
    'MiniBatchSize', mini_batch_size, ...
    'InitialLearnRate', initial_learning_rate, ...
    'SequenceLength','shortest', ...
    'Shuffle','every-epoch',...
    'ExecutionEnvironment','gpu', ...
    'Verbose', false, ...
    'Plots','training-progress');
net = trainNetwork(AllCellTrain, YCA, layers, options);
predicted_labels = classify(net, AllCellTest,'ExecutionEnvironment','gpu');
acc = mean(predicted_labels == YCT)

4 个评论
显示 2更早的评论隐藏 2更早的评论

Hamza 2023-11-6

编辑：Hamza 2023-11-6

Hi @Christopher McCausland , thanks for your answer, I ma trying to classify 7 emotion classes, for your information I have used the same data on 1D CNN and got 90% accuracy, didnt know the issue on LSTM, also when I shufflued the colunms "the features" I got diffrent result, which souldnt be the case. you find the attached curve! thanks in advance

Christopher McCausland 2023-11-6

Hi @Hamza,

To me this looks like classic overfitting, your model appears to train well and learn features, however these features are overfitted to the training data, and are not representative of genralised data.

A few things to consider;

Do you have multiple speakers? If so, how do you pick which speakers are in the test/train set.
You have 280 input sequences, and seven classes, if the data is perfectly ballanced you have 40 observations per class, is this enough?
Can you include a validation split to prevent overfitting?
These are just a few ways to prevent overfitting/ ensure your data is appropreate for training, there are many other which I would suggest you take a look at.

In terms of the CNN preformance, were the test/train set the same and how many epochs did you train the CNN for?

请先登录，再进行评论。

请先登录，再回答此问题。

Low LSTM Accuracy in Speech Recognition

4 个评论
显示 2更早的评论隐藏 2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Low LSTM Accuracy in Speech Recognition

4 个评论 显示 2更早的评论隐藏 2更早的评论

回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论