量化、投影和剪枝

通过执行量化、投影或剪枝来压缩深度神经网络

将 Deep Learning Toolbox™ 与 Deep Learning Toolbox Model Compression Library 支持包结合使用，通过以下方式减少深度神经网络的内存占用和计算要求：

使用一阶泰勒逼近从卷积层中对滤波器进行剪枝。然后，您可以从这个剪枝过的网络中生成 C/C++ 或 CUDA^® 代码。
对层进行投影，先使用代表训练数据的数据集对层激活执行主成分分析 (PCA)，然后对层的可学习参数应用线性投影。当您使用无库的 C/C++ 代码生成将网络部署到嵌入式硬件时，投影的深度神经网络的前向传导通常会更快。
将层的权重、偏置和激活量化为精度降低的缩放整数数据类型。然后，您可以从这个量化的网络中为 GPU、FPGA 或 CPU 部署生成 C/C++、CUDA 或 HDL 代码。
使用深度网络设计器对网络进行压缩分析。

有关 Deep Learning Toolbox Model Compression Library 中提供的压缩技术的详细概述，请参阅Reduce Memory Footprint of Deep Neural Networks。

函数

剪枝

`taylorPrunableNetwork`	Neural network suitable for compression using Taylor pruning (自 R2022a 起)
`forward`	Compute deep learning network output for training
`predict`	Compute deep learning network output for inference
`updatePrunables`	Remove filters from prunable layers based on importance scores (自 R2022a 起)
`updateScore`	Compute and accumulate Taylor-based importance scores for pruning (自 R2022a 起)
`dlnetwork`	Deep learning neural network

投影

`compressNetworkUsingProjection`	Compress neural network using projection (自 R2022b 起)
`neuronPCA`	Principal component analysis of neuron activations (自 R2022b 起)
`unpackProjectedLayers`	Unpack projected layers of neural network (自 R2023b 起)
`ProjectedLayer`	Compressed neural network layer using projection (自 R2023b 起)
`gruProjectedLayer`	Gated recurrent unit (GRU) projected layer for recurrent neural network (RNN) (自 R2023b 起)
`lstmProjectedLayer`	Long short-term memory (LSTM) projected layer for recurrent neural network (RNN) (自 R2022b 起)

量化

`dlquantizer`	Quantize a deep neural network to 8-bit scaled integer data types
`dlquantizationOptions`	Options for quantizing a trained deep neural network
`prepareNetwork`	Prepare deep neural network for quantization (自 R2024b 起)
`calibrate`	Simulate and collect ranges of a deep neural network
`quantize`	Quantize deep neural network (自 R2022a 起)
`validate`	Quantize and validate a deep neural network
`quantizationDetails`	Display quantization details for a neural network (自 R2022a 起)
`estimateNetworkMetrics`	Estimate network metrics for specific layers of a neural network (自 R2022a 起)
`equalizeLayers`	Equalize layer parameters of deep neural network (自 R2022b 起)
`exportNetworkToSimulink`	Generate Simulink model that contains deep learning layer blocks and subsystems that correspond to deep learning layer objects (自 R2024b 起)

App

深度网络量化器

Quantize deep neural network to 8-bit scaled integer data types

主题

概述

Reduce Memory Footprint of Deep Neural Networks
Learn about neural network compression techniques, including pruning, projection, and quantization.

剪枝

Analyze and Compress 1-D Convolutional Neural Network
Analyze 1-D convolutional network for compression and compress it using Taylor pruning and projection. (自 R2024b 起)
Parameter Pruning and Quantization of Image Classification Network
Use parameter pruning and quantization to reduce network size.
Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning.
Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.
Prune and Quantize Convolutional Neural Network for Speech Recognition
Compress a convolutional neural network (CNN) to prepare it for deployment on an embedded system.

投影和知识蒸馏

Compress Neural Network Using Projection
This example shows how to compress a neural network using projection and principal component analysis.
Evaluate Code Generation Inference Time of Compressed Deep Neural Network
This example shows how to compare the inference time of a compressed deep neural network for battery state of charge estimation. (自 R2023b 起)
Train Smaller Neural Network Using Knowledge Distillation
This example shows how to reduce the memory footprint of a deep learning network by using knowledge distillation. (自 R2023b 起)

量化

Quantization of Deep Neural Networks
Overview of the deep learning quantization tools and workflows.
Data Types and Scaling for Quantization of Deep Neural Networks
Understand effects of quantization and how to visualize dynamic ranges of network convolution layers.
Quantization Workflow Prerequisites
Products required for the quantization of deep learning networks.
Supported Layers for Quantization
Deep neural network layers that are supported for quantization.
Prepare Data for Quantizing Networks
Supported datastores for quantization workflows.
Quantize Multiple-Input Network Using Image and Feature Data
Quantize Multiple Input Network Using Image and Feature Data
Export Quantized Networks to Simulink and Generate Code
Export a quantized neural network to Simulink and generate code from the exported model.

GPU 目标的量化

Generate INT8 Code for Deep Learning Networks (GPU Coder)
Quantize and generate code for a pretrained convolutional neural network.
Quantize Residual Network Trained for Image Classification and Generate CUDA Code
This example shows how to quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.
Quantize Layers in Object Detectors and Generate CUDA Code
This example shows how to generate CUDA® code for an SSD vehicle detector and a YOLO v2 vehicle detector that performs inference computations in 8-bit integers for the convolutional layers.
Quantize Semantic Segmentation Network and Generate CUDA Code
Quantize Convolutional Neural Network Trained for Semantic Segmentation and Generate CUDA Code

FPGA 目标的量化

Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types.
Classify Images on FPGA Using Quantized Neural Network (Deep Learning HDL Toolbox)
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA.
Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example shows how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image.

CPU 目标的量化

Generate int8 Code for Deep Learning Networks (MATLAB Coder)
Quantize and generate code for a pretrained convolutional neural network.
Generate INT8 Code for Deep Learning Network on Raspberry Pi (MATLAB Coder)
Generate code for deep learning network that performs inference computations in 8-bit integers.
Compress Image Classification Network for Deployment to Resource-Constrained Embedded Devices
This example shows how to reduce the memory footprint and computation requirements of an image classification network for deployment on resource constrained embedded devices such as the Raspberry Pi™.

精选示例

Prune Image Classification Network Using Taylor Scores

Reduce the size of a deep neural network using Taylor pruning. By using the taylorPrunableNetwork function to remove convolution layer filters, you can reduce the overall network size and increase the inference speed.

打开实时脚本

Prune Filters in a Detection Network Using Taylor Scores

Reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

打开实时脚本

Compress Neural Network Using Projection

Compress a neural network using projection and principal component analysis.

打开实时脚本

Quantize Residual Network Trained for Image Classification and Generate CUDA Code

Quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.

打开实时脚本

Prune and Quantize Semantic Segmentation Network

Reduce the memory footprint of a semantic segmentation network and speed-up inference by compressing the network using pruning and quantization.

打开实时脚本