音频处理

利用音频和语音处理应用扩展深度学习工作流

通过将 Audio Toolbox™ 与 Deep Learning Toolbox™ 结合使用，将深度学习应用于音频和语音处理应用。有关信号处理应用，请参阅信号处理。有关无线通信中的应用，请参阅无线通信。

App

Label signal attributes, regions, and points of interest, and extract features

函数

数据管理和增强

`audioDatastore`	Datastore for collection of audio files
`audioDataAugmenter`	Augment audio data (自 R2019b 起)

特征提取

`audioFeatureExtractor`	Streamline audio feature extraction (自 R2019b 起)
`openl3Embeddings`	Extract OpenL3 feature embeddings (自 R2022a 起)
`pitchnn`	Estimate pitch with deep learning neural network (自 R2021a 起)
`vggishEmbeddings`	Extract VGGish feature embeddings (自 R2022a 起)

预训练网络

`yamnet`	(Not recommended) YAMNet neural network (自 R2020b 起)
`classifySound`	Classify sounds in audio signal (自 R2020b 起)
`crepe`	(Not recommended) CREPE neural network (自 R2021a 起)
`pitchnn`	Estimate pitch with deep learning neural network (自 R2021a 起)
`vggish`	(Not recommended) VGGish neural network (自 R2020b 起)
`vggishEmbeddings`	Extract VGGish feature embeddings (自 R2022a 起)
`openl3`	(Not recommended) OpenL3 neural network (自 R2021a 起)
`openl3Embeddings`	Extract OpenL3 feature embeddings (自 R2022a 起)
`vadnet`	(Not recommended) Voice activity detection (VAD) neural network (自 R2023a 起)
`detectspeechnn`	Detect boundaries of speech in audio signal using AI (自 R2023a 起)
`separateSpeakers`	Separate signal by speakers (自 R2023b 起)

模块

全部展开

VGGish

VGGish	VGGish embeddings extraction network (自 R2022a 起)
VGGish Embeddings	Extract VGGish embeddings (自 R2022a 起)

YAMNet

YAMNet	YAMNet sound classification network (自 R2021b 起)
Sound Classifier	Classify sounds in audio signal (自 R2021b 起)

OpenL3

OpenL3	OpenL3 embeddings extraction network (自 R2022b 起)
OpenL3 Embeddings	Extract OpenL3 embeddings (自 R2022b 起)

CREPE

CREPE	CREPE deep pitch estimation neural network (自 R2023a 起)
Deep Pitch Estimator	Estimate pitch with CREPE deep learning neural network (自 R2023a 起)

主题

Deep Learning for Audio Applications (Audio Toolbox)
Learn common tools and workflows to apply deep learning to audio applications.
Classify Sound Using Deep Learning (Audio Toolbox)
Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.
Adapt Pretrained Audio Network for New Data Using Deep Network Designer
This example shows how to interactively adapt a pretrained network to classify new audio signals using Deep Network Designer.
Audio Transfer Learning Using Experiment Manager
Configure an experiment that compares the performance of multiple pretrained networks applied to a speech command recognition task using transfer learning.
Compare Speaker Separation Models
Compare the performance, size, and speed of multiple deep learning speaker separation models.
Speaker Identification Using Custom SincNet Layer and Deep Learning
Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.
Dereverberate Speech Using Deep Learning Networks
Train a deep learning model that removes reverberation from speech.
Speech Command Recognition in Simulink
Detect the presence of speech commands in audio using a Simulink^® model.
Sequential Feature Selection for Audio Features
This example shows a typical workflow for feature selection applied to the task of spoken digit recognition.
Train Spoken Digit Recognition Network Using Out-of-Memory Audio Data
This example trains a spoken digit recognition network on out-of-memory audio data using a transformed datastore.
Train Spoken Digit Recognition Network Using Out-of-Memory Features
This example trains a spoken digit recognition network on out-of-memory auditory spectrograms using a transformed datastore.
Investigate Audio Classifications Using Deep Learning Interpretability Techniques
This example shows how to use interpretability techniques to investigate the predictions of a deep neural network trained to classify audio data.
Accelerate Audio Deep Learning Using GPU-Based Feature Extraction
Leverage GPUs for feature extraction to decrease the time required to train an audio deep learning model.

精选示例

Compress Machine Fault Recognition Neural Network Using Projection

Compress a pretrained acoustics-based machine fault recognition neural network using projection and principal component analysis.

打开实时脚本

Audio-Based Anomaly Detection for Machine Health Monitoring

Design an autoencoder neural network to perform anomaly detection for machine sounds using unsupervised learning.

打开实时脚本

3-D Speech Enhancement Using Trained Filter and Sum Network

Perform speech enhancement using a pretrained filter and sum network (FaSNet) with ambisonic data.

打开实时脚本

3-D Sound Event Localization and Detection Using Trained Recurrent Convolutional Neural Network

Perform 3-D sound event localization and detection using a pretrained deep learning model.

打开实时脚本

Speaker Recognition Using x-vectors

Develop an x-vector system to perform speaker recognition.

打开实时脚本

Speaker Diarization Using x-vectors

Speaker diarization is the process of partitioning an audio signal into segments according to speaker identity. It answers the question "who spoke when" without prior knowledge of the speakers and, depending on the application, without prior knowledge of the number of speakers.

打开实时脚本

使用深度学习训练语音命令识别模型

此示例说明如何训练一个深度学习模型来检测音频中是否存在语音命令。此示例使用语音命令数据集 [1] 来训练卷积神经网络，以识别一组命令。

打开实时脚本

Keyword Spotting in Noise Using MFCC and LSTM Networks

Identify a keyword in noisy speech using a deep learning network. In particular, the example uses a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficients (MFCC).

打开实时脚本

使用深度学习网络对语音去噪

此示例说明如何使用深度学习网络对语音信号去噪。该示例比较应用于同一任务的两种类型的网络：全连接网络和卷积网络。

打开实时脚本

Train Generative Adversarial Network (GAN) for Sound Synthesis

Train and use a generative adversarial network (GAN) to generate sounds.

打开实时脚本

使用深度学习检测噪声中的语音活动

在此示例中，您使用预训练的深度学习模型在低 SNR 环境中执行批量和流式语音活动检测 (VAD)。有关该模型及其训练方式的详细信息，请参阅Train Voice Activity Detection in Noise Model Using Deep Learning (Audio Toolbox)。

打开实时脚本

Speech Emotion Recognition

Illustrates a simple speech emotion recognition (SER) system using a BiLSTM network. You begin by downloading the data set and then testing the trained network on individual files. The network was trained on a small German-language database [1].

打开实时脚本

Acoustic Scene Recognition Using Late Fusion

Create a multi-model late fusion system for acoustic scene recognition. The example trains a convolutional neural network (CNN) using mel spectrograms and an ensemble classifier using wavelet scattering. The example uses the TUT dataset for training and evaluation [1].

打开实时脚本

End-to-End Deep Speaker Separation

Use an end-to-end deep learning network for speaker-independent speech separation.

打开实时脚本

Acoustics-Based Machine Fault Recognition

Develop a deep learning model to detect faults in an air compressor and package the system to operate on streaming data.

打开实时脚本

Audio Event Classification Using TensorFlow Lite on Raspberry Pi

Perform audio event classification on Raspberry Pi^® using the YAMNet pretrained deep neural network from the TensorFlow™ Lite library.

打开实时脚本

Keyword Spotting in Noise Code Generation on Raspberry Pi

Demonstrates code generation for keyword spotting using a Bidirectional Long Short-Term Memory (BiLSTM) network and mel frequency cepstral coefficient (MFCC) feature extraction on Raspberry Pi™. MATLAB® Coder™ with Deep Learning Support enables the generation of a standalone executable (.elf) file on Raspberry Pi. Communication between MATLAB® (.mlx) file and the generated executable file occurs over asynchronous User Datagram Protocol (UDP). The incoming speech signal is displayed using a timescope. A mask is shown as a blue rectangle surrounding spotted instances of the keyword, YES. For more details on MFCC feature extraction and deep learning network training, visit Keyword Spotting in Noise Using MFCC and LSTM Networks (Audio Toolbox).

打开实时脚本

Speech Command Recognition Code Generation with Intel MKL-DNN

Deploy feature extraction and a convolutional neural network (CNN) for speech command recognition on Intel® processors. To generate the feature extraction and network code, you use MATLAB® Coder™ and the Intel® Math Kernel Library for Deep Neural Networks (MKL-DNN). In this example, the generated code is a MATLAB executable (MEX) function, which is called by a MATLAB script that displays the predicted speech command along with the time domain signal and auditory spectrogram. For details about audio preprocessing and network training, see Train Speech Command Recognition Model Using Deep Learning (Audio Toolbox).

打开实时脚本