Audio Toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, acoustic scene recognition, and many more.
audioDatastore to ingest large audio data sets and
process files in parallel.
Use Audio Labeler to build audio data sets by annotating audio recordings manually and automatically.
audioDataAugmenter to create randomized pipelines
of built-in or custom signal processing methods for augmenting
and synthesizing audio data sets.
audioFeatureExtractor to extract combinations of
different features while sharing intermediate
Audio Toolbox also provides access to third-party APIs for text-to-speech and speech-to-text, and it includes pretrained VGGish and YAMNet models so that you can perform transfer learning, classify sounds, and extract feature embeddings. Using pretrained networks requires Deep Learning Toolbox™.