Apply LSTM network to .ogg files

Question

Pooyan Mobtahej 2020-10-26

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/626153-apply-lstm-network-to-ogg-files

编辑： Pooyan Mobtahej 2020-10-27

I need to apply LSTM and get results for large datasets of .ogg audio files (datasets) in Matlab, Data can be separated into three parts. For example, 80% of all normal and anomaly signals for training (2 classes), 10% for validation, and 10% for testing.

I have used the following code but you can suggest me proper modification:

How to define Normal and Anomaly arrays with different sizes?

How to define Test?

%ADS = audioDatastore('/Users/pooyan/OneDrive - lamar.edu','FileExtensions','.ogg')
folder='/Users/pooyan/Documents/computer Vision';
audio_files=dir(fullfile(folder,'*.ogg'));
j=length(audio_files);
normal = zeros(132300,1); %return matrix size(normal_name)
anomaly = zeros(132300,1);
Fs=44100; %sample rate according to .ogg file
for i = 1:length(audio_files) 
    normal_name = strcat('normal_',num2str(i),'.ogg'); 
    anomoly_name = strcat('anomaly_',num2str(i),'.ogg'); 
    
    %[y,Fs] = audioread(filename)
   [normal(i)] = audioread(normal_name); 
   [anomaly(i)] = audioread(anomaly_name); %can add Fs sample rate?
   %normal(i) = zeros(size(normal_name),1); %return matrix size(normal_name)
   %anomaly(i) = zeros(size(anomaly_name),1);
end 
audioTrain = [normal(:,0.8*(1:length(audio_files))),anomaly(:,0.8*(1:length(audio_files)))]; %precentage
audioValidation = [normal(:,0.1*(1:length(audio_files))),anomaly(:,0.1*(1:length(audio_files)))];
%  Create an audioFeatureExtractor object 
%to extract the centroid and slope of the mel spectrum over time.
aFE = audioFeatureExtractor("SampleRate",Fs, ...    %Fs
    "SpectralDescriptorInput","melSpectrum", ...
    "spectralCentroid",true, ...
    "spectralSlope",true);
featuresTrain = extract(aFE,audioTrain);
[numHopsPerSequence,numFeatures,numSignals] = size(featuresTrain);
numHopsPerSequence;
numFeatures;
numSignals;
%treat the extracted features as sequences and use a
%sequenceInputLayer as the first layer of your deep learning model. 
featuresTrain = permute(featuresTrain,[2,1,3]);
featuresTrain = squeeze(num2cell(featuresTrain,[1,2]));
numSignals = numel(featuresTrain);
[numFeatures,numHopsPerSequence] = size(featuresTrain{1});
%Extract the validation features.
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,[2,1,3]);
featuresValidation = squeeze(num2cell(featuresValidation,[1,2]));
%Define the network architecture.
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(50,"OutputMode","last")
    fullyConnectedLayer(numel(unique(audioTrain))) %%labelTrain=audio
    softmaxLayer
    classificationLayer];
%To define the training options
options = trainingOptions("adam", ...
    "Shuffle","every-epoch", ...
    "ValidationData",{featuresValidation,audioValidation}, ... %%labelValidatin=audioValidation
    "Plots","training-progress", ...
    "Verbose",false);
%To train the network
net = trainNetwork(featuresTrain,audioTrain,layers,options);
%Test the network %10 preccent 
normalTest = normal(:,0.1*(1:length(audio_files)));
classify(net,extract(aFE,normalTest)')
anomalyTest = anomaly(:,0.1*(1:length(audio_files)));
classify(net,extract(aFE,anomalyTest)')