detectSpeech
Syntax
Description
specifies options using one or more idx
= detectSpeech(audioIn
,fs
,Name,Value
)Name,Value
pair arguments.
Example: detectSpeech(audioIn,fs,'Window',hann(512,'periodic'),'OverlapLength',256)
detects speech using a 512-point periodic Hann window with 256-point overlap.
[
also returns the thresholds used to compute the boundaries of speech.idx
,thresholds
] = detectSpeech(___)
detectSpeech(___)
with no output arguments displays a
plot of the detected speech regions in the input signal.
Examples
Input Arguments
Output Arguments
Algorithms
The detectSpeech
algorithm is based on [1], although modified so that
the statistics to threshold are short-term energy and spectral spread, instead of short-term
energy and spectral centroid. The diagram and steps provide a high-level overview of the
algorithm. For details, see [1].
The audio signal is converted to a time-frequency representation using the specified
Window
andOverlapLength
.The short-term energy and spectral spread is calculated for each frame. The spectral spread is calculated according to
spectralSpread
.Histograms are created for both the short-term energy and spectral spread distributions.
For each histogram, a threshold is determined according to , where M1 and M2 are the first and second local maxima, respectively. W is set to
5
.Both the spectral spread and the short-term energy are smoothed across time by passing through successive five-element moving median filters.
Masks are created by comparing the short-term energy and spectral spread with their respective thresholds. To declare a frame as containing speech, a feature must be above its threshold.
The masks are combined. For a frame to be declared as speech, both the short-term energy and the spectral spread must be above their respective thresholds.
Regions declared as speech are merged if the distance between them is less than
MergeDistance
.
References
[1] Giannakopoulos, Theodoros. "A Method for Silence Removal and Segmentation of Speech Signals, Implemented in MATLAB", (University of Athens, Athens, 2009).
Extended Capabilities
Version History
Introduced in R2020a