silence period of audio still got values in MFCC,but it shouldn't be like this.

3 次查看(过去 30 天)
Hi there! I am planning to extract the timbre fearture of audio using MFCC function in Matlab. there is 1s silence at the beginingand and the end of the targeted audio(4s in total).The function goes well except for that the silence period also get specific values for each coefficients which I suppose should be 0. I don't know the reason and resolution for this.
I know that I could just delete the silence area before carring out MFCC, but the 300 audio files I'm going to deal with have different length with each other. Some of them have 0.5s silence period at the end, some have 0.7s, etc.
So I am wondering whether there are better solutions for this problem.
Thanks very much!

回答(1 个)

Brian Hemmat
Brian Hemmat 2024-5-6
Hi Elaine,
Depending on what you're doing with this, removing the silence may not be necessary. A lot of machine learning models can handle that kind of "noise" and just ignore it--unless the amount of silence is correlated to the type of audio you're analyzing.
Even if not doing speech, the detectSpeech function will probably give reasonable start and end points to your region of interest. For example:
[audioIn,fs] = audioread('foo.wav');
% Call detectSpeech to get the beginning and end samples of a speech region
% (will probably work OK for lots of types of audio)
roi = detectSpeech(audioIn,fs);
% Remove the silence.
audioIn = audioIn(roi(1):roi(end));
% Extract MFCC.
featuresOut = mfcc(audioIn,fs);
Another option would be to use short-time energy. You can do that before calculating the mfcc, or at the same time while using audioFeatureExtractor, as in the sketch below.
[audioIn,fs] = audioread('foo.wav');
% Extract MFCC and short-time energy
afe = audioFeatureExtractor(mfcc=true,shortTimeEnergy=true,SampleRate=fs);
featuresOut = extract(afe,audioIn);
% Remove MFCC that correspond to silent regions
idx = info(afe);
threshold = 0.2; % set empirically based on your dataset
featuresOut(idx.shortTimeEnergy<threshold,:) = [];
Whatever method you choose, if the end goal is some kind of machine learning, make sure to mimic the same steps for inference.

类别

Help CenterFile Exchange 中查找有关 Speech Recognition 的更多信息

产品


版本

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by