Low LSTM Accuracy in Speech Recognition
2 次查看(过去 30 天)
显示 更早的评论
Hello everyone, I am applying LSTM to speech emotion recognition. I have performed feature extraction using MFCC, resulting in a matrix of dimensions 60,575 × 39. I subsequently transformed this matrix into a cell array named "AllCellTrain" with dimensions 280 × 1, containing signals of varying sizes, as illustrated in the image below. I then utilized "AllCellTrain" as input for the trainNetwork function, along with the labels YCA, network layers, and training options. However, I encountered a significant issue with accuracy, achieving only around 20%. I'm unsure where I may have made a mistake. Could someone please offer some assistance?
num_hidden_units = 1024;
layers = [
sequenceInputLayer(num_features)
lstmLayer(num_hidden_units, 'OutputMode', 'last')
fullyConnectedLayer(num_classes)
softmaxLayer
classificationLayer];
% Specify the training options
max_epochs = 36;
mini_batch_size = 28;
initial_learning_rate = 0.001;
options = trainingOptions('adam', ...
'MaxEpochs', max_epochs, ...
'MiniBatchSize', mini_batch_size, ...
'InitialLearnRate', initial_learning_rate, ...
'SequenceLength','shortest', ...
'Shuffle','every-epoch',...
'ExecutionEnvironment','gpu', ...
'Verbose', false, ...
'Plots','training-progress');
net = trainNetwork(AllCellTrain, YCA, layers, options);
predicted_labels = classify(net, AllCellTest,'ExecutionEnvironment','gpu');
acc = mean(predicted_labels == YCT)
4 个评论
Christopher McCausland
2023-11-6
To me this looks like classic overfitting, your model appears to train well and learn features, however these features are overfitted to the training data, and are not representative of genralised data.
A few things to consider;
- Do you have multiple speakers? If so, how do you pick which speakers are in the test/train set.
- You have 280 input sequences, and seven classes, if the data is perfectly ballanced you have 40 observations per class, is this enough?
- Can you include a validation split to prevent overfitting?
- These are just a few ways to prevent overfitting/ ensure your data is appropreate for training, there are many other which I would suggest you take a look at.
In terms of the CNN preformance, were the test/train set the same and how many epochs did you train the CNN for?
回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Speech Recognition 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!