CNN1d Speech Emotion Recognition

Question

young 2024-4-8

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2104366-cnn1d-speech-emotion-recognition

编辑： Ayush Anand 2024-5-23

audio_features.mat

Hello everyone,

I am trying a SER project with CNN. I extract features like MFCC, Mel, Pitch and Intensity. Then, I got a traing matrix of size 936x1 cell (200x1 double in every cell) and a label matrix of size 1x936 categorical. But the result is not well. I am not sure what I need to do to improve.

The .mat file is attached

My 1dCNN code is like below.

    load('audio_features.mat', 'X_train', 'X_test', 'y_train', 'y_test');
    
    x_traincnn = num2cell(X_train, 2);
    y_traincnn = categorical(y_train.'); 
    
    x_testcnn = num2cell(X_test, 2);
    y_testcnn = categorical(y_test.'); 
    
    x_traincnn = cellfun(@(x) x', x_traincnn, 'UniformOutput', false);
    x_testcnn = cellfun(@(x) x', x_testcnn, 'UniformOutput', false);
    
    disp(size(x_traincnn));  
    disp(size(x_testcnn));   
    disp(size(y_traincnn));  
    disp(size(y_testcnn));   
    
    numFeatures = 200;
    numClasses = numel(categories(y_train));
    
    filterSize = 5;
    numFilters = 32;
    rng('default');
    layers = [ ...
        sequenceInputLayer(numFeatures)
        convolution1dLayer(filterSize,numFilters,Padding="causal")
        reluLayer
        layerNormalizationLayer
        convolution1dLayer(filterSize,2*numFilters,Padding="causal")
        reluLayer
        layerNormalizationLayer
        globalAveragePooling1dLayer
        fullyConnectedLayer(numClasses)
        softmaxLayer
        classificationLayer];
    
    miniBatchSize = 27;
    options = trainingOptions("adam", ...
        MaxEpochs=200, ...
        InitialLearnRate=0.01, ...
        SequencePaddingDirection="left", ...
        ValidationData={x_testcnn,y_testcnn}, ...
        Plots="training-progress", ...
        Verbose=0);
    
    net = trainNetwork(x_traincnn, y_traincnn, layers, options);
    
    YPred = classify(net,x_testcnn, ...
        SequencePaddingDirection="left");
    acc = mean(YPred == y_testcnn);
    disp(["Accuracy: ", acc]);
    confMat = confusionmat(y_testcnn, YPred);
    disp(confMat);
    figure;
    confusionchart(y_testcnn,YPred);
    

Could anyone please help me? Thanks!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Ayush Anand 2024-5-23

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2104366-cnn1d-speech-emotion-recognition#answer_1462241

编辑：Ayush Anand 2024-5-23

training.png

Hi Young,

I think the problem is with the data itself and not the training parameters. I analysed the data file attached and it is extremely sparse, meaning the data ranges from values of the order

to

(200). This sparsity inherently hinders the network to learn useful features. Even on scaling/normalizing the data, the

values are so small as compared to the values in positive orders of magnitude, they are scaled to the same value due to precision limits.

An improvement on this is to ignore the first feature and all features after the first 13, as they do not contain valuable information for the network to learn. Doing this and scaling the 12 features from (2:13) will improve the results slightly(make sure to introduce a dropout layer with some fraction say 0.25 to avoid overfitting) and the accuracy will reach somewhere around 50%, but the learning still saturates there(see the image attached).

At this point, its not the limitation of the network but the inherent knowledge contained in the dataset; it is not sufficient for the network to learn a generalized model which can provide good validation accuracy. I would suggest you to maybe re-calculate the feature vectors and experiment with more feature engineering to extract useful features for training.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

CNN1d Speech Emotion Recognition

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

CNN1d Speech Emotion Recognition

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论