Extract MFCC, log energy, delta, and delta-delta of audio signal
coeffs = mfcc(audioIn,fs,'LogEnergy','Replace')returns mel frequency cepstral coefficients for the audio input signal sampled at
fsHz. The first coefficient in the
coeffsvector is replaced with the log energy value.
Compute the mel frequency cepstral coefficients of a speech signal using the
mfcc function. The function returns
delta, the change in coefficients, and
deltaDelta, the change in delta values. The log energy value that the function computes can prepend the coefficients vector or replace the first element of the coefficients vector. This is done based on whether you set the
'LogEnergy' argument to
Read an audio signal from the
'Counting-16-44p1-mono-15secs.wav' file using the
audioread function. The
mfcc function processes the entire speech data in a batch. Based on the number of input rows, the window length, and the overlap length,
mfcc partitions the speech into 1551 frames and computes the cepstral features for each frame. Each row in the
coeffs matrix corresponds to the log-energy value followed by the 13 mel-frequency cepstral coefficients for the corresponding frame of the speech file. The function also computes
loc, the location of the last sample in each input frame.
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav'); [coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
Read in an audio file and convert it to a frequency representation.
[audioIn,fs] = audioread("Rainbow-16-8-mono-114secs.wav"); win = hann(1024,"periodic"); S = stft(audioIn,"Window",win,"OverlapLength",512,"Centered",false);
To extract the mel-frequency cepstral coefficients, call
mfcc with the frequency-domain audio. Ignore the log-energy.
coeffs = mfcc(S,fs,"LogEnergy","Ignore");
In many applications, MFCC observations are converted to summary statistics for use in classification tasks. Plot a probability density function for one of the mel-frequency cepstral coefficients to observe its distributions.
nbins = 60; coefficientToAnalyze = 4; histogram(coeffs(:,coefficientToAnalyze+1),nbins,"Normalization","pdf") title(sprintf("Coefficient %d",coefficientToAnalyze))
audioIn— Input signal
Input signal, specified as a vector, matrix, or 3-D array.
audioIn is real, it is interpreted as a
time-domain signal and must be a column vector or a matrix. Columns
of the matrix are treated as independent audio channels.
audioIn is complex, it is interpreted as a
frequency-domain signal. In this case,
must be an
array, where L is the number of DFT points,
M is the number of individual spectrums, and
N is the number of individual
Complex Number Support: Yes
fs— Sample rate (Hz)
Sample rate of the input signal in Hz, specified as a positive scalar.
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs,'LogEnergy','Replace','DeltaWindowLength',5)returns mel frequency cepstral coefficients for the audio input signal sampled at
fsHz. The first coefficient in the
coeffsvector is replaced with the log energy value. A set of 5 cepstral coefficients is used to compute the delta and the delta-delta values.
'Window'— Window applied in time domain
hamming(round(fs*0.3),'periodic')(default) | vector
'OverlapLength'— Number of overlapping samples between adjacent windows
round((default) | integer
'NumCoeffs'— Number of coefficients returned
13(default) | positive scalar integer
Number of coefficients returned for each window of data, specified as an integer in the range [2 v], where v is the number of valid passbands.
The number of valid passbands is defined as
<= floor(fs/2))-2. A passband is valid if its edges
fs/2, where fs is the
sample rate of the input audio signal, specified as the second argument,
'BandEdges'— Band edges of filter bank (Hz)
Band edges of the filter bank in Hz, specified as a nonnegative
monotonically increasing row vector in the range [0,
fs/2]. The number of band edges must be in the
range [4, 160]. The
mfcc function designs
half-overlapped triangular filters based on
BandEdges. This means that all band edges,
except for the first and last, are also center frequencies of the
designed bandpass filters.
BandEdges is a 42-element vector,
which results in a 40-band filter bank that spans approximately 133 Hz
to 6864 Hz. The default bands are spaced as described in .
'FFTLength'— Number of bins for calculating DFT
numel((default) | positive scalar integer
Number of bins used to calculate the discrete Fourier transform (DFT)
of windowed input samples. The FFT length must be greater than or equal
to the number of elements in the
'Rectification'— Type of non-linear rectification
Type of nonlinear rectification applied prior to the discrete cosine
transform, specified as
'DeltaWindowLength'— Number of coefficients for calculating delta and delta-delta
9(default) | odd integer greater than 2
Number of coefficients used to calculate the delta and the delta-delta
values, specified as the comma-separated pair consisting of
'DeltaWindowLength' and an odd integer greater
than two. If unspecified,
Deltas are computed using the
'LogEnergy'— Specify how the log energy is shown
Specify how the log energy is shown in the coefficients vector output, specified as:
'Append' –– The function prepends the
log energy to the coefficients vector. The length of the
coefficients vector is 1 +
'Replace' –– The function replaces the
first coefficient with the log energy of the signal. The
length of the coefficients vector is
'Ignore' –– The object does not
calculate or return the log energy.
coeffs— Mel frequency cepstral coefficients (MFCCs)
Mel frequency cepstral coefficients, returned as an L-by-M matrix or an L-by-M-by-N array, where:
L –– Number of analysis windows the audio
signal is partitioned into. The input size,
OverlapLength control this dimension:
OverlapLength) + 1
LogEnergy is set to:
'Append' –– The function
prepends the log energy value to the coefficients
vector. The length of the coefficients vector is 1 +
'Replace' –– The function
replaces the first coefficient with the log energy
of the signal. The length of the coefficients vector
'Ignore' –– The function does
not calculate or return the log energy. The length
of the coefficients vector is
N –– Number of input channels (columns).
This value is
delta— Change in coefficients
Change in coefficients from one frame of data to another, returned as an
L-by-M matrix or an
delta array is the same size and data type
loc— Location of the last sample in each input frame
Location of last sample in each analysis window, returned as a column
vector with the same number of rows as
Mel frequency cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of mel frequency cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
The default mel filter bank linearly spaces the first 10 triangular filters and logarithmically spaces the remaining filters.
The information contained in the zeroth mel frequency cepstral coefficient is often augmented with or replaced by the log energy. The log energy calculation depends on the input domain.
If the input (audioIn) is a time-domain signal, the log energy is computed using the following equation:
If the input (audioIn) is a frequency-domain signal, the log energy is computed using the following equation:
Behavior changed in R2020b
The delta and delta-delta calculations are now computed using the
audioDelta function, which has a different startup behavior than the
previous algorithm. The default value of the
parameter has changed from
9. A delta
window length of
2 is no longer supported.
WindowLengthwill be removed in a future release
Behavior change in future release
WindowLength parameter will be removed from the
mfcc function in a future release. Use the
Window parameter instead.
In releases prior to R2020b, you could only specify the length of a time-domain window. The window was always designed as a periodic Hamming window. You can replace instances of the code
coeffs = mfcc(audioin,fs,'WindowLength',1024);
coeffs = mfcc(audioIn,fs,'Window',hamming(1024,'periodic'));
 Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.
 Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf