Main Content

Code Generation for Quantized Deep Learning Networks

Deep learning uses neural network architectures that contain many processing layers, including convolutional layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep Neural networks trained in MATLAB® use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, you can reduce the memory requirements of the network.

You can use Deep Learning Toolbox™ in tandem with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Then, you can use MATLAB Coder™ to generate optimized code for the quantized network. The generated code takes advantage of ARM® processor SIMD by using the ARM Compute library. The generated code can be integrated into your project as source code, static or dynamic libraries, or executables that you can deploy to a variety of ARM CPU platforms such as Raspberry Pi™.

Supported Layers and Classes

You can generate C++ code for these layers that uses the ARM Compute Library and performs inference computations in 8-bit integers:

C++ code generation for quantized deep learning networks supports DAGNetwork (Deep Learning Toolbox) and SeriesNetwork (Deep Learning Toolbox) objects.

Generating Code

To generate code that performs inference computations in 8-bit integers, in your coder.ARMNEONConfig object dlcfg, set these additional properties:

dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile'; 
dlcfg.DataType = 'int8';

Alternatively, in the MATLAB Coder app, on the Deep Learning tab, set Target library to ARM Compute. Then set the Data type and Calibration result file path parameters.

Here 'dlquantizerObjectMatFile' is the name of the MAT-file that dlquantizer (Deep Learning Toolbox) generates for specific calibration data. For the purpose of calibration, set the ExecutionEnvironment property of the dlquantizer object to 'CPU'.

Otherwise, follow the steps described in Code Generation for Deep Learning Networks with ARM Compute Library.

For an example, see Code Generation for Quantized Deep Learning Network on Raspberry Pi.

See Also




Related Topics