Different MFCC obtained from audioFeatureExtractor and MFCC function

Question

0 个投票

Hi,

I'm trying to use the "audioFeatureExtractor" and the MFCC function to get the MFCC data from an audio sample but noticed the coefficients are different. I´m assuming some default settings are different between these codes but cannot figure exactly what the difference is. Could you please help? Please find below a simple script to provide more detail. I'm comparing "MFCC1" MFCC2". I´ve tried several .wav and .m4a files but the MFCCs were never the same so I´m just using a generic "xxxxxxx" for file name.

[audioIn,fs] = audioread("xxxxxxx");

win1 = hamming(round(0.03*fs),"periodic");

win2 = round(0.015*fs);

aFE = audioFeatureExtractor(SampleRate=fs,Window=win1,OverlapLength=win2,mfcc=1);

features = extract(aFE,audioIn);

idx = info(aFE);

MFCC1 = features(:,idx.mfcc);

MFCC2 = mfcc(audioIn,fs,"LogEnergy","ignore","Window",win1,"Overlaplength",win2,"NumCoeffs",13);

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

MathWorks Audio Toolbox Team 2024-6-18

在 MATLAB Online 中打开

0 个投票

The mfcc function follows the historically popular Auditory Toolbox implementation by Slaney. In this case, the mel bandpass filters are spaced linearly until 1 kHz and logarithmically thereafter. They also have a start point at 133.33 etc. Hz. The defaults spacing of the mel bands in the audioFeatureExtractor object follow the O'Shaughnessy formula. The default audioFeatureExtractor formulation is a bit more common now, especially for the mel spectrogram intermediate step.

What follows is one way to make the two implementations approximately equal. An alternative way to the below is to modify the mfcc function by setting the bandedges option.

Get the bandedges of the the Slaney implementation that the mfcc function uses.

bE = slaneybandedges();

Define your input and parameters.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
win1 = hamming(round(0.03*fs),"periodic");
overlapLength = round(0.015*fs);

Get the default output of the mfcc function

mfcc_output = mfcc(audioIn,fs,LogEnergy="ignore");

Create an audioFeatureExtractor object and set the options to extract the same feature as mfcc function.

aFE = audioFeatureExtractor(SampleRate=fs, ...

Window=hamming(round(0.03*fs),"periodic"), ...

OverlapLength=round(0.02*fs), ...

mfcc=true, ...

FFTLength=numel(win1));

setExtractorParameters(aFE,"melSpectrum", ...

MelStyle="slaney", ...

SpectrumType="magnitude", ...

WindowNormalization=false, ...

FilterBankDesignDomain="linear", ...

FilterBankNormalization="bandwidth", ...

NumBands=40, ...

FrequencyRange=[bE(1),bE(end)])

features = extract(aFE,audioIn);

idx = info(aFE);

afe_output = features(:,idx.mfcc);

coeffToInspect = 1;

plot(afe_output(:,coeffToInspect),'bo'),hold on

plot(mfcc_output(:,coeffToInspect),'r*'),hold off

rms(afe_output(:)-mfcc_output(:))

ans = 2.3120e-04

Supporting Function

function bE = slaneybandedges()
% Default band edges as defined by the documentation for the
% Auditory Toolbox.
factor = 133.33333333333333;
bE = zeros(1,42);
for ii = 1:13
    bE(ii) = factor + (factor/2)*(ii-1);
end
for ii = 14:42
    bE(ii) = bE(ii-1)*1.0711703;
end
end

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Fabiano Guimaraes 2024-6-19

Thank you very much for the clear answer.

请先登录，再进行评论。

Different MFCC obtained from audioFeatureExtractor and MFCC function

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

Different MFCC obtained from audioFeatureExtractor and MFCC function

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

更多回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论