Audio Toolbox™ provides functionality to develop machine and deep learning solutions for audio, speech, and acoustic applications including speaker identification, speech command recognition, acoustic scene recognition, and many more.
Use audioDatastore
to ingest large audio data sets and
process files in parallel.
Use Audio Labeler to build audio data sets by annotating audio recordings manually and automatically.
Use audioDataAugmenter
to create randomized pipelines
of built-in or custom signal processing methods for augmenting
and synthesizing audio data sets.
Use audioFeatureExtractor
to extract combinations of
different features while sharing intermediate
computations.
Audio Toolbox also provides access to third-party APIs for text-to-speech and speech-to-text, and it includes pretrained VGGish and YAMNet models so that you can perform transfer learning, classify sounds, and extract feature embeddings. Using pretrained networks requires Deep Learning Toolbox™.