Voice Activity Detector
Detect presence of speech in audio signal
Libraries:
Audio Toolbox /
Measurements
Description
The Voice Activity Detector block detects the presence of speech in an audio signal. You can also use the Voice Activity Detector block to output an estimate of the noise variance per frequency bin.
Examples
Ports
Input
Output
Parameters
Block Characteristics
Data Types |
|
Direct Feedthrough |
|
Multidimensional Signals |
|
Variable-Size Signals |
|
Zero-Crossing Detection |
|
Algorithms
The Voice Activity Detector implements the algorithm described in [1].
If Domain of the input is specified as
Time
, the input signal is windowed and then converted to
the frequency domain according to the Window, Sidelobe
attenuation of the window (dB), and FFT length
parameters. If Domain of the input is specified as
Frequency
, the input is assumed to be a windowed discrete
time Fourier transform (DTFT) of an audio signal. The signal is then converted to the
power domain. Noise variance is estimated according to [2]. The posterior and
prior SNR are estimated according to the Minimum Mean-Square Error (MMSE) formula
described in [3]. A log likelihood
ratio test with a Hidden Markov Model (HMM)-based hang-over scheme is used, according to
[1].
References
[1] Sohn, Jongseo., Nam Soo Kim, and Wonyong Sung. "A Statistical Model-Based Voice Activity Detection." Signal Processing Letters IEEE. Vol. 6, No. 1, 1999.
[2] Martin, R. "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics." IEEE Transactions on Speech and Audio Processing. Vol. 9, No. 5, 2001, pp. 504–512.
[3] Ephraim, Y., and D. Malah. "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator." IEEE Transactions on Acoustics, Speech, and Signal Processing. Vol. 32, No. 6, 1984, pp. 1109–1121.
Extended Capabilities
Version History
Introduced in R2018a