Get Started with Deep Learning FPGA Deployment on Xilinx ZC706 SoC
This example shows how to create, compile, and deploy a handwritten character detection series network to an FPGA and use MATLAB® to retrieve the prediction results.
Prerequisites
Xilinx® Zynq® ZC706 Evaluation Kit
Load Pretrained Network
Load the pretrained network trained on the Modified National Institute of Standards and Technology (MNIST) database[1].
net = getDigitsNetwork;
View the layers of the pretrained network by using the Deep network Designer app.
deepNetworkDesigner(net)
Define FPGA Board Interface
Define the target FPGA board programming interface by using the dlhdl.Target object. Create a programming interface with custom name for your target device and a JTAG interface to connect the target device to the host computer. To use JTAG, install Xilinx™ Vivado™ Design Suite 2023.1. Set the toolpath by using the hdlsetuptoolpath function.
hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2023.1\bin\vivado.bat');
hTarget = dlhdl.Target('Xilinx');Prepare Network for Deployment
Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx® Zynq® ZC706 Evaluation Kit and the bitstream uses the single data type.
hW = dlhdl.Workflow(Network=net,Bitstream='zc706_single',Target=hTarget);Compile Network
Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.
dn = compile(hW)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zc706_single.
### An output layer called 'Output1_softmax' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
1 'imageinput' Image Input 28×28×1 images with 'zerocenter' normalization (SW Layer)
2 'conv_1' 2-D Convolution 8 3×3×1 convolutions with stride [1 1] and padding 'same' (HW Layer)
3 'relu_1' ReLU ReLU (HW Layer)
4 'maxpool_1' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
5 'conv_2' 2-D Convolution 16 3×3×8 convolutions with stride [1 1] and padding 'same' (HW Layer)
6 'relu_2' ReLU ReLU (HW Layer)
7 'maxpool_2' 2-D Max Pooling 2×2 max pooling with stride [2 2] and padding [0 0 0 0] (HW Layer)
8 'conv_3' 2-D Convolution 32 3×3×16 convolutions with stride [1 1] and padding 'same' (HW Layer)
9 'relu_3' ReLU ReLU (HW Layer)
10 'fc' Fully Connected 10 fully connected layer (HW Layer)
11 'softmax' Softmax softmax (SW Layer)
12 'Output1_softmax' Regression Output mean-squared-error (SW Layer)
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_softmax' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>maxpool_2 ...
### Compiling layer group: conv_1>>maxpool_2 ... complete.
### Compiling layer group: conv_3>>relu_3 ...
### Compiling layer group: conv_3>>relu_3 ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.
### Allocating external memory buffers:
offset_name offset_address allocated_space
_______________________ ______________ _________________
"InputDataOffset" "0x00000000" "184.0 kB"
"OutputResultOffset" "0x0002e000" "4.0 kB"
"SchedulerDataOffset" "0x0002f000" "204.0 kB"
"SystemBufferOffset" "0x00062000" "76.0 kB"
"InstructionDataOffset" "0x00075000" "52.0 kB"
"ConvWeightDataOffset" "0x00082000" "28.0 kB"
"FCWeightDataOffset" "0x00089000" "76.0 kB"
"EndOffset" "0x0009c000" "Total: 624.0 kB"
### Network compilation complete.
dn = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
constantData: {{} [1×1568 single]}
ddrInfo: [1×1 struct]
resourceTable: [6×2 table]
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx® Zynq® ZC706 hardware, run the deploy method of the dlhdl.Workflow object. This method programs the FPGA board using the output of the compile method and the programming file, downloads the network weights and biases, displays progress messages, and the time it takes to deploy the network.
deploy(hW)
Test Network
Load the example image.
inputImg = imread('five_28x28.pgm'); inputImg = dlarray(single(inputImg),'SSCB');
Classify the image on the FPGA by using the predict method of the dlhdl.Workflow object and display the results.
[prediction,speed] = hW.predict(single(inputImg),'Profile','on');
### Finished writing input activations.
### Running single input activation.
Deep Learning Processor Profiler Performance Results
LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s
------------- ------------- --------- --------- ---------
Network 62340 0.00062 1 63218 1581.8
imageinput_norm 3527 0.00004
conv_1 10376 0.00010
maxpool_1 6774 0.00007
conv_2 11786 0.00012
maxpool_2 5450 0.00005
conv_3 17181 0.00017
fc 7214 0.00007
* The clock frequency of the DL processor is: 100MHz
[val,idx] = max(prediction);
fprintf('The prediction result is %d\n', idx-1);The prediction result is 5
Bibliography
LeCun, Y., C. Cortes, and C. J. C. Burges. "The MNIST Database of Handwritten Digits." https://yann.lecun.com/exdb/mnist/.
See Also
dlhdl.Workflow | dlhdl.Target | compile | deploy | predict