speechClient
Description
Use a speechClient
object to interface with a pretrained
speech-to-text model, pretrained text-to-speech model, or third-party cloud-based speech
services. Use the object with speech2text
or
text2speech
.
Note
Using the Emformer, Whisper, or HiFi-GAN pretrained models requires Deep Learning Toolbox™ and Audio Toolbox™ Interface for SpeechBrain and Torchaudio Libraries. You can download this support package from the Add-On Explorer. For more information, see Get and Manage Add-Ons.
To interface with third-party speech services, you must download the extended Audio Toolbox functionality from File Exchange. The File Exchange submission includes a tutorial to get started with the third-party services.
Using wav2vec 2.0 requires Deep Learning Toolbox and installing the pretrained model.
Creation
Description
Input Arguments
Output Arguments
Properties
Object Functions
reset | Reset states for streaming-enabled speech clients |
Note
For the third-party speech services, you can configure server-specific options using the following functions. See the documentation for the specific service for option names and values.
setOptions | Set server options |
getOptions | Get server options |
clearOptions | Remove all server options |
Examples
References
[1] Baevski, Alexei, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” 2020. https://doi.org/10.48550/ARXIV.2006.11477.
[2] Kürzinger, Ludwig, Dominik Winkelbauer, Lujun Li, Tobias Watzel, and Gerhard Rigoll. “CTC-Segmentation of Large Corpora for German End-to-End Speech Recognition.” In Speech and Computer, edited by Alexey Karpov and Rodmonga Potapova, 12335:267–78. Cham: Springer International Publishing, 2020. https://doi.org/10.1007/978-3-030-60276-5_27.
[3] Radford, Alec, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. “Robust Speech Recognition via Large-Scale Weak Supervision.” arXiv, December 6, 2022. https://doi.org/10.48550/arXiv.2212.04356.
Version History
Introduced in R2022b