Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries
Deep Learning models supporting Audio Toolbox AI-powered functions for speech and audio signal processing
409.0 次下载
更新时间
2025/5/14
The Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries enables the use of a collection of AI-powered speech processing functions in Audio Toolbox™ for automatic speech recognition (ASR) and speech synthesis.
Using Audio Toolbox and the Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries, MATLAB users can take advantage of state-of-the-art AI models, without requiring any familiarity with Deep Learning.
The add-on automates the installation of Python® and PyTorch®, and it downloads selected Deep Learning models from the SpeechBrain and Torchaudio libraries. Once installed, it allows users to run the following functions through the underlying use of local AI models:
- The speech2text function accepts a speechClient object with the model set to emformer or whisper for speech-to-text (STT) and automatic speech recognition (ASR). These complement the local wav2vec model, and the cloud service options Google, IBM, Microsoft, and Amazon. Using whisper also requires downloading the model weights separately, as described in Download Whisper Speech-to-Text Model
- The text2speech function accepts a speechClient object with the model set to hifigan for text-to-speech (TTS) and speech synthesis. This complements the cloud service options Google, IBM, Microsoft, and Amazon.
The speech2text and text2speech functions accept and return strings and audio samples. They automate the whole end-to-end pipelines for automatic speech recognition and speech synthesis, while hiding from the user any signal pre-processing, feature extraction, model prediction, and output post-processing. speech2text can also be used interactively through the Signal Labeler App.
Follow the links below for practical code examples:
- Perform Speech-to-Text Transcription
- Label Speech Recordings Using Speech-to-Text in Signal Labeler
- Identify languages from Speech Signals
- Translate and transcribe multi-language speech using Whisper
- Synthesize speech from text using a local model
- Use Emformer for Streaming Speech-to-Text
Version history
For detailed release notes, see the Audio Toolbox Release Notes (Filtered by Audio Toolbox Interface for SpeechBrain and Torchaudio Libraries).
MATLAB 版本兼容性
创建方式
R2024a
兼容 R2024a 到 R2025a 的版本
平台兼容性
Windows macOS (Apple 芯片) macOS (Intel) Linux标签
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!