silence period of audio still got values in MFCC,but it shouldn't be like this.

Question

Elaine 2024-5-3

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2114836-silence-period-of-audio-still-got-values-in-mfcc-but-it-shouldn-t-be-like-this

评论： Elaine 2024-5-6

Hi there! I am planning to extract the timbre fearture of audio using MFCC function in Matlab. there is 1s silence at the beginingand and the end of the targeted audio(4s in total).The function goes well except for that the silence period also get specific values for each coefficients which I suppose should be 0. I don't know the reason and resolution for this.

I know that I could just delete the silence area before carring out MFCC, but the 300 audio files I'm going to deal with have different length with each other. Some of them have 0.5s silence period at the end, some have 0.7s, etc.

So I am wondering whether there are better solutions for this problem.

Thanks very much!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Brian Hemmat 2024-5-6

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2114836-silence-period-of-audio-still-got-values-in-mfcc-but-it-shouldn-t-be-like-this#answer_1453046

在 MATLAB Online 中打开

Hi Elaine,

Depending on what you're doing with this, removing the silence may not be necessary. A lot of machine learning models can handle that kind of "noise" and just ignore it--unless the amount of silence is correlated to the type of audio you're analyzing.

Even if not doing speech, the detectSpeech function will probably give reasonable start and end points to your region of interest. For example:

[audioIn,fs] = audioread('foo.wav');
% Call detectSpeech to get the beginning and end samples of a speech region
% (will probably work OK for lots of types of audio)
roi = detectSpeech(audioIn,fs);
% Remove the silence.
audioIn = audioIn(roi(1):roi(end));
% Extract MFCC.
featuresOut = mfcc(audioIn,fs);

Another option would be to use short-time energy. You can do that before calculating the mfcc, or at the same time while using audioFeatureExtractor, as in the sketch below.

[audioIn,fs] = audioread('foo.wav');
% Extract MFCC and short-time energy
afe = audioFeatureExtractor(mfcc=true,shortTimeEnergy=true,SampleRate=fs);
featuresOut = extract(afe,audioIn);
% Remove MFCC that correspond to silent regions
idx = info(afe);
threshold = 0.2; % set empirically based on your dataset
featuresOut(idx.shortTimeEnergy<threshold,:) = []; 

Whatever method you choose, if the end goal is some kind of machine learning, make sure to mimic the same steps for inference.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Elaine 2024-5-6

I'm so grateful for your kind response. It's very helpful! Thanks a lot!

请先登录，再进行评论。

silence period of audio still got values in MFCC,but it shouldn't be like this.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

silence period of audio still got values in MFCC,but it shouldn't be like this.

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论