Pretrained Models

Transfer learning, sound classification, feature embeddings, pretrained audio deep learning networks

Audio Toolbox™ provides MATLAB^® and Simulink^® support for pretrained audio deep learning networks. Locate and classify sounds with YAMNet and estimate pitch with CREPE. Extract VGGish or OpenL3 feature embeddings to input to machine learning and deep learning systems. Use i-vector systems to produce compact representations of audio signals for applications such as speaker recognition, verification, identification, and diarization. Use detectspeechnn to perform voice activity detection (VAD).

Using pretrained deep learning networks requires Deep Learning Toolbox™. The Audio Toolbox pretrained networks are available in Deep Network Designer (Deep Learning Toolbox).

Functions

expand all

Pretrained Networks

audioPretrainedNetwork Pretrained audio neural networks (Since R2024a)

VGGish

`vggishEmbeddings`	Extract VGGish feature embeddings (Since R2022a)
`vggishPreprocess`	Preprocess audio for VGGish feature extraction (Since R2021a)

YAMNet

`classifySound`	Classify sounds in audio signal
`yamnetGraph`	Graph of YAMNet AudioSet ontology
`yamnetPreprocess`	Preprocess audio for YAMNet classification (Since R2021a)

OpenL3

`openl3Embeddings`	Extract OpenL3 feature embeddings (Since R2022a)
`openl3Preprocess`	Preprocess audio for OpenL3 feature extraction (Since R2021a)

CREPE

`pitchnn`	Estimate pitch with deep learning neural network (Since R2021a)
`crepePreprocess`	Preprocess audio for CREPE deep learning network (Since R2021a)
`crepePostprocess`	Postprocess output of CREPE deep learning network (Since R2021a)

i-Vectors

`speakerRecognition`	Pretrained speaker recognition system (Since R2021b)
`ivectorSystem`	Create i-vector system (Since R2021a)

VAD

`detectspeechnn`	Detect boundaries of speech in audio signal using AI (Since R2023a)
`vadnetPreprocess`	Preprocess audio for voice activity detection (VAD) network (Since R2023a)
`vadnetPostprocess`	Postprocess frame-based VAD probabilities (Since R2023a)

Blocks

expand all

VGGish

VGGish Embeddings	Extract VGGish embeddings (Since R2022a)
VGGish Preprocess	Preprocess audio for VGGish feature extraction (Since R2022a)
VGGish	VGGish embeddings extraction network (Since R2022a)

YAMNet

Sound Classifier	Classify sounds in audio signal (Since R2021b)
YAMNet	YAMNet sound classification network (Since R2021b)
YAMNet Preprocess	Preprocess audio for YAMNet classification (Since R2021b)

OpenL3

OpenL3 Embeddings	Extract OpenL3 embeddings (Since R2022b)
OpenL3 Preprocess	Preprocess audio for OpenL3 embeddings extraction (Since R2022b)
OpenL3	OpenL3 embeddings extraction network (Since R2022b)

CREPE

Deep Pitch Estimator	Estimate pitch with CREPE deep learning neural network (Since R2023a)
CREPE	CREPE deep pitch estimation neural network (Since R2023a)
CREPE Preprocess	Preprocess audio for CREPE deep pitch estimation (Since R2023a)
CREPE Postprocess	Postprocess output of CREPE pitch estimation network (Since R2023a)

Apps

Deep Network Designer

Design and visualize deep learning networks

Topics

Audio Transfer Learning Using Experiment Manager
Configure an experiment that compares the performance of multiple pretrained networks applied to a speech command recognition task using transfer learning.
Speaker Diarization Using Pretrained AI Models
Use the speakerEmbeddings function to extract compact speaker representations and perform speaker diarization. (Since R2024b)
Classify Human Voice Using YAMNet on Android Device (Simulink)
This example shows how to use the Simulink® Support Package for Android® Devices and a pretrained YAMNet network to classify human voices.

Related Information

Featured Examples

Adapt Pretrained Audio Network for New Data Using Deep Network Designer

Interactively fine-tune a pretrained network to classify new audio signals using Deep Network Designer.

Open Live Script

Investigate Audio Classifications Using Deep Learning Interpretability Techniques

Use interpretability techniques to investigate the predictions of a deep neural network trained to classify audio data.

Open Live Script