Main Content

Quantization, Projection, and Pruning

Compress a deep neural network by performing quantization, projection, or pruning

Use Deep Learning Toolbox™ together with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint and computational requirements of a deep neural network by:

  • Pruning filters from convolution layers by using first-order Taylor approximation. You can then generate C/C++ or CUDA® code from this pruned network.

  • Projecting layers by performing principal component analysis (PCA) on the layer activations using a data set representative of the training data and applying linear projections on the layer learnable parameters. Forward passes of a projected deep neural network are typically faster when you deploy the network to embedded hardware using library-free C/C++ code generation.

  • Quantizing the weights, biases, and activations of layers to reduced precision scaled integer data types. You can then generate C/C++, CUDA, or HDL code from this quantized network for GPU, FPGA, or CPU deployment.

For a detailed overview of the compression techniques available in Deep Learning Toolbox Model Quantization Library, see Reduce Memory Footprint of Deep Neural Networks.

Functions

expand all

taylorPrunableNetworkNeural network suitable for compression using Taylor pruning (Since R2022a)
forwardCompute deep learning network output for training
predictCompute deep learning network output for inference
updatePrunablesRemove filters from prunable layers based on importance scores (Since R2022a)
updateScoreCompute and accumulate Taylor-based importance scores for pruning (Since R2022a)
dlnetworkDeep learning neural network
compressNetworkUsingProjectionCompress neural network using projection (Since R2022b)
neuronPCAPrincipal component analysis of neuron activations (Since R2022b)
unpackProjectedLayersUnpack projected layers of neural network (Since R2023b)
ProjectedLayerCompressed neural network layer using projection (Since R2023b)
gruProjectedLayerGated recurrent unit (GRU) projected layer for recurrent neural network (RNN) (Since R2023b)
lstmProjectedLayerLong short-term memory (LSTM) projected layer for recurrent neural network (RNN) (Since R2022b)
dlquantizerQuantize a deep neural network to 8-bit scaled integer data types (Since R2020a)
dlquantizationOptionsOptions for quantizing a trained deep neural network (Since R2020a)
prepareNetworkPrepare deep neural network for quantization (Since R2024b)
calibrateSimulate and collect ranges of a deep neural network (Since R2020a)
quantizeQuantize deep neural network (Since R2022a)
validateQuantize and validate a deep neural network (Since R2020a)
quantizationDetailsDisplay quantization details for a neural network (Since R2022a)
estimateNetworkMetricsEstimate network metrics for specific layers of a neural network (Since R2022a)
equalizeLayersEqualize layer parameters of deep neural network (Since R2022b)
exportNetworkToSimulinkGenerate Simulink model that contains deep learning layer blocks that correspond to deep learning layer objects (Since R2024b)

Apps

Deep Network QuantizerQuantize deep neural network to 8-bit scaled integer data types (Since R2020a)

Topics

Overview

Pruning

Projection and Knowledge Distillation

Quantization

Quantization for GPU Target

Quantization for FPGA Target

Quantization for CPU Target

Featured Examples