i want to use LSTM based audio network to work with Live audio

Question

Arslan Munim 2022-7-27

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1768630-i-want-to-use-lstm-based-audio-network-to-work-with-live-audio

评论： Arslan Munim 2022-9-28

Hello Matlab team,

I am using this example to work with my audio data set https://www.mathworks.com/matlabcentral/fileexchange/74611-fault-detection-using-deep-learning-classification#examples_tab dataset is trained but I want to make the application live with PC, forexample I have a mic and make an application to use my trained model to predict the output.

Can you guide me or help me with that?

Regards,

Arslan Munaim

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

jibrahim 2022-7-27

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1768630-i-want-to-use-lstm-based-audio-network-to-work-with-live-audio#answer_1016040

在 MATLAB Online 中打开

Hi Arslan,

There is a function in that repo (streamingClassifier) that should get the job done in conjunction with an audio device reader:

% Create a microphone object
adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
% These statistic value should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    % Pass to network
    scores = streamingClassifier(frame,M,S);
    % Use the scores any way you want
end

5 个评论
显示 3更早的评论隐藏 3更早的评论

Arslan Munim 2022-7-28

编辑：Arslan Munim 2022-7-28

Hi jibrahim,

Thanks for your reply, I tried using streamingClassifier. however I am trying to use extract function instead of extractFeatures function (because of dependenices issues) however with extract function I can only use one feature at a time. however I trained network with 11 features.

Can you please how i can use extract function in streamingClassifier? I am attaching code for your reference:

windowLength = 512;

overlapLength = 0;

aFE = audioFeatureExtractor('SampleRate',44100, ...

'Window',hamming(windowLength,'periodic'),...

'OverlapLength',overlapLength,...

'spectralCentroid',true, ...

'spectralCrest',true,...

'spectralDecrease',true, ...

'spectralEntropy',true,...

'spectralFlatness',true,...

'spectralFlux',true,...

'spectralKurtosis',true,...

'spectralRolloffPoint',true,...

'spectralSkewness',true,...

'spectralSlope',true,...

'spectralSpread',true);

features = extract(aFE , audioIn)

%%%%%%%%%features = extractFeatures(audioIn);

% Normalize

features = ((features - M')./S');

[net, scores] = predictAndUpdateState(net,features);

jibrahim 2022-7-28

在 MATLAB Online 中打开

Hi Arslan,

The extract function should also return 11 features. For example, if you replace the eixsting function extractFeatures with this modified function, things should work the same:

function featureVector = extractFeatures2(x)
%#codegen
persistent afe
if isempty(afe)
    windowLength = 512;
    overlapLength = 0;
    afe = audioFeatureExtractor('SampleRate',44100, ...
        'Window',hamming(windowLength,'periodic'),...
        'OverlapLength',overlapLength,...
        'spectralCentroid',true, ...
        'spectralCrest',true,...
        'spectralDecrease',true, ...
        'spectralEntropy',true,...
        'spectralFlatness',true,...
        'spectralFlux',true,...
        'spectralKurtosis',true,...
        'spectralRolloffPoint',true,...
        'spectralSkewness',true,...
        'spectralSlope',true,...
        'spectralSpread',true);
end
featureVector = extract(afe,x);
end

The size of featureVector will be 1-by-11, each element in the vector representing one of your features.

Notice I declared afe as persistent. This is to ensure the audio feature extractor is not recreated every time you call this function in your loop. the extractor goes through some one-time setup computations when you first call it. No need to waste time repeating those.

jibrahim 2022-8-2

在 MATLAB Online 中打开

Hi Arslan,

Since you trained the network with a sample rate of 16e3, you will have to perform sample-rate conversion from 44100 kHz to 16 kHz. This code is a possible implementation, where you essentially feed the network frames of length 512 sampled at 16 kHz, just like the original code:

% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,...
                              Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D; % get as close to desired frame size
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=frameLength);
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    % Convert to 16 KHz
    frame = src(frame); 
    % Save to buffer
    write(buff,frame)
    while buff.NumUnreadSamples >= 512
        frame = read(buff,512);
        % Pass to network
        scores = streamingClassifier(frame,M,S);
        % Use the scores any way you want
    end
end

Note that you can also potentially feed the network longer frames. That should also work, and is probably more efficient, as the network will run faster if you give it a long input (as opposed to multiple short ones):

% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D;
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=frameLength);
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    % Convert to 16 KHz
    frame = src(frame); 
    % Save to buffer
    write(buff,frame)
    N = buff.NumUnreadSamples;
    L = floor(N/512);
    if L>0
        frame = read(buff,512*L);
        % Pass to network
        scores = streamingClassifier(frame,M,S);
        % Use the scores any way you want
    end
end

If you can't change the frame size on the microphone, then you can handle that using another buffer, for example:

% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D;
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=22000);
buffSRC = dsp.AsyncBuffer;
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    write(buffSRC,frame);
    frame = read(buffSRC,frameLength);
    % Convert to 16 KHz
    frame = src(frame); 
    % Save to buffer
    write(buff,frame)
    N = buff.NumUnreadSamples;
    L = floor(N/512);
    if L>0
        frame = read(buff,512*L);
        % Pass to network
        scores = streamingClassifier(frame,M,S);
        % Use the scores any way you want
    end
end

Arslan Munim 2022-8-9

Hi jibrahim,

Thankyou for your support, it was very helpful.

Now I want to use multiple mics for prediction can you please give me some idea how i can use streaming classifier with 3 or 4 mics of the predicition.

Thanks and have a nice day.

Regards,

Arslan

请先登录，再进行评论。

Answer 2

jibrahim 2022-8-9

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1768630-i-want-to-use-lstm-based-audio-network-to-work-with-live-audio#answer_1023635

Hi Arslan,

audioDeviceReader supports multi-mic devices. Use the ChannelMappingSource and ChannelMapping properties to map between device input channels and the output data.

This network was trained on mono data, so, to adapt it to multi-channel data, you either have to retrain your network for multi-channel data, or somehow combine your input channels into one channel (by a weighted sum, or selecting a particular channel, etc) and proceed like above.

23 个评论
显示 21更早的评论隐藏 21更早的评论

Arslan Munim 2022-8-17

编辑：Walter Roberson 2022-8-19

在 MATLAB Online 中打开

Hi jibrahim,

I try to read data from multiple mic but it is giving me this error everytime i try to use multiple mic, I am trying to read frame from each Microphone and send that data to streaming classifier to predict the output but it giving me error always on frame1 = adr1()

Error using audioDeviceReader/setup

A given audio device may only be opened once.

Error in audioDeviceReader/setupImpl

Error in multipleMic (line 10)

frame1 = adr1() - Show complete stack trace

adr1 = audioDeviceReader(SampleRate=44.1e3,SamplesPerFrame=22000, Device="Microphone (4- USB PnP Sound Device)",BitDepth="16-bit integer");
adr2 = audioDeviceReader(SampleRate=44.1e3,SamplesPerFrame=22000, Device="Microphone (USB PnP Sound Device)",BitDepth="16-bit integer");
% These statistic value should come from your training...
% M = 0;
% S = 1;
while 1
    % Read a frame of data from microphone
    frame1 = adr1()
    frame2 = adr2()  
    % Pass to network
    [class] = streamingClassifier2(frame1,frame2,M,S)
    % Use the scores any way you want
end
function [class] = streamingClassifier2(frame1,frame2,M,S)
% This is a streaming classifier function 
persistent net; 
if isempty(net)
    net = coder.loadDeepLearningNetwork('net.mat');
end
% Extract features using function
%features = extract(aFE , audioIn)
features1 = extractFeatures2(frame1);
features2 = extractFeatures2(frame2);
% Normalize 
features1 = ((features1 - M)./S).';
features2 = ((features2 - M)./S).';
% Classify
[class] = classify(net,{features1,features2});
%[net, scores] = classify(net,feature)
end

jibrahim 2022-8-20

OK, this helps. You will need other hardware (one device, multiple mics) for the system to recognize it. You could also give the UDP idea a shot, see how viable that is.

Arslan Munim 2022-9-28

Hi again,

I am trying to train my network, with lowering BitsPerSample to 8 before it was 16 BitsPerSample. Every time i try to start training model it throw warning (given below) and terminates.

I try it with different sample rate but it gives same error everytime. I tried to change my layer structure, changing InitialLearnRate',0.001 but still i am getting same warning.

Warning: Training stopped at iteration 1 because training loss is NaN. Predictions using the output network might contain NaN values.

Model:

layers = [ ...

sequenceInputLayer(size(trainingFeatures{1},1))

lstmLayer(100,"OutputMode","sequence")

dropoutLayer(0.1)

lstmLayer(100,"OutputMode","last")

fullyConnectedLayer(5)

softmaxLayer

classificationLayer];

miniBatchSize = 30;

validationFrequency = floor(numel(trainingFeatures)/miniBatchSize);

options = trainingOptions("adam", ...

"MaxEpochs",100, ...

"MiniBatchSize",miniBatchSize, ...

"Plots","training-progress", ...

"Verbose",false, ...

"Shuffle","every-epoch", ...

"LearnRateSchedule","piecewise", ...

"LearnRateDropFactor",0.1, ...

"LearnRateDropPeriod",20,...

'InitialLearnRate',0.001,...

'ValidationData',{validationFeatures,adsValidation.Labels}, ...

'ValidationFrequency',validationFrequency);

Regards,

Arslan

请先登录，再进行评论。

i want to use LSTM based audio network to work with Live audio

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

5 个评论
显示 3更早的评论隐藏 3更早的评论

23 个评论
显示 21更早的评论隐藏 21更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

i want to use LSTM based audio network to work with Live audio

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

5 个评论 显示 3更早的评论隐藏 3更早的评论

23 个评论 显示 21更早的评论隐藏 21更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

5 个评论
显示 3更早的评论隐藏 3更早的评论

23 个评论
显示 21更早的评论隐藏 21更早的评论