Main Content

Quantization

Quantize network parameters to reduced-precision data types; prepare deep learning network for fixed-point code generation

Quantize the weights, biases, and activations of layers to reduced-precision scaled integer data types. You can then generate C/C++, CUDA®, or HDL code from this quantized network for GPU, FPGA, or CPU deployment.

For a detailed overview of the compression techniques available in Deep Learning Toolbox™ Model Compression Library, see Reduce Memory Footprint of Deep Neural Networks.

Simplified illustration of quantization. On the left is a sketch of a neural network consisting of three layers with two, three, and one neuron, respectively. Each neuron in every layer is connected to all neurons in neighboring layers. An arrow points to a second sketch on the right, which shows the same network with the weights indicated by dotted lines instead of full lines, which indicates that the weights are stored with smaller precision.

Functions

dlquantizerQuantize a deep neural network to 8-bit scaled integer data types
dlquantizationOptionsOptions for quantizing a trained deep neural network
prepareNetworkPrepare deep neural network for quantization (Since R2024b)
calibrateSimulate and collect ranges of a deep neural network
quantizeQuantize deep neural network (Since R2022a)
validateQuantize and validate a deep neural network
quantizationDetailsDisplay quantization details for a neural network (Since R2022a)
estimateNetworkMetricsEstimate network metrics for specific layers of a neural network (Since R2022a)
equalizeLayersEqualize layer parameters of deep neural network (Since R2022b)
exportNetworkToSimulinkGenerate Simulink model that contains deep learning layer blocks and subsystems that correspond to deep learning layer objects (Since R2024b)

Apps

Deep Network QuantizerQuantize deep neural network to 8-bit scaled integer data types

Topics

Understanding Quantization

Pre-Deployment Workflows

Deployment

Considerations

Featured Examples