Classify Images on FPGA by Using Quantized GoogLeNet Network
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image. The example uses the pretrained GoogLeNet network to demonstrate transfer learning, quantization, and deployment for the quantized network. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results.
Deploy the quantized GoogLeNet network by creating a dlhdl.Workflow
object. Use the dlhdl.Workflow
object to:
Generate a list of instructions, weights and biases by using the
compile
method.Generate a programming file for the FPGA by using the
deploy
method.Retrieve the network prediction results and performance by using the
predict
method.
GoogLeNet has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and many animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input, and then outputs a label for the object in the image together with the probabilities for each of the object categories.
Prerequisites
Deep Learning Toolbox™
Deep Learning HDL Toolbox™
Deep Learning Toolbox Model for GoogLeNet Network
Deep Learning HDL Toolbox™ Support Package for Intel FPGA and SoC
Image Processing Toolbox™
Intel Arria10 SoC development kit
Deep Learning Toolbox™ Model Quantization Library support package.
MATLAB Coder Interface for Deep learning Libraries
Transfer Learning Using GoogLeNet
To perform classification on a new set of images, you fine-tune a pretrained GoogLeNet convolutional neural network by transfer learning. In transfer learning, you can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training images.
Load Pretrained DAG Network
Load the pretrained DAG network, GoogLeNet.
net = googlenet;
Use the analyzeNetwork
function to obtain information about the network layers.
analyzeNetwork(net);
The first layer, the image input layer, requires input images of size 224-by-224-by-3, where 3 is the number of color channels.
inputSize = net.Layers(1).InputSize
inputSize = 1×3
224 224 3
Define Training and Validation Data Sets
This example uses the MathWorks
MerchData data set. This is a small data set containing 75 images of MathWorks merchandise, belonging to five different classes (cap, cube, playing cards, screwdriver, and torch).
unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames');
Divide the data into training and validation data sets. Use 70% of the images for training and 30% for validation. splitEachLabel
splits the images
datastore into two new datastores.
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');
This data set now contains 55 training images and 20 validation images. Display some sample images.
numTrainImages = numel(imdsTrain.Labels); idx = randperm(numTrainImages,16); figure for i = 1:16 subplot(4,4,i) I = readimage(imdsTrain,idx(i)); imshow(I) end
Replace Final Layers
The fully connected layer and classification layer of the pretrained network net
are configured for 1000 classes. These two layers, loss3-classifier
and output
in GoogLeNet, contain information on how to combine the features that the network extracts into class probabilities, a loss value, and predicted labels. To retrain a pretrained network to classify new images, replace these two layers with new layers adapted to the new data set.
Extract the layer graph from the trained network.
lgraph = layerGraph(net)
lgraph = LayerGraph with properties: Layers: [144×1 nnet.cnn.layer.Layer] Connections: [170×2 table] InputNames: {'data'} OutputNames: {'output'}
Replace the fully connected layer with a new fully connected layer that has number of outputs equal to the number of classes. To make learning faster in the new layers than in the transferred layers, increase the WeightLearnRateFactor
and BiasLearnRateFactor
values of the fully connected layer.
numClasses = numel(categories(imdsTrain.Labels))
numClasses = 5
Remove 'loss3-classifier', 'prob' and 'output' layers from the lgraph.
layers = net.SortedLayers; for i = 0:2 lgraph = removeLayers(lgraph,layers(end-i).Name); end
Create three new layers and add them to the lgraph. Ensure the transferred and new layers are properly connected together in the lgraph.
newLayers = [ fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20,'Name','newFC') softmaxLayer('Name','newProb') classificationLayer('Name','newClassOutput',"Classes","auto")]; lgraph = addLayers(lgraph,newLayers); lgraph = connectLayers(lgraph,layers(end-3).Name,'newFC');
Train Network
The network requires input images of size 224-by-224-by-3, but the images in the image datastores have different sizes. Use an augmented image datastore to automatically resize the training images. Specify additional augmentation operations to perform on the training images: randomly flip the training images along the vertical axis, and randomly translate them up to 30 pixels horizontally and vertically. Data augmentation helps prevent the network from over-fitting and memorizing the exact details of the training images.
pixelRange = [-30 30]; imageAugmenter = imageDataAugmenter( ... 'RandXReflection',true, ... 'RandXTranslation',pixelRange, ... 'RandYTranslation',pixelRange); augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ... 'DataAugmentation',imageAugmenter);
To automatically resize the validation images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.
augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
Specify the training options. For transfer learning, keep the features from the early layers of the pretrained network (the transferred layer weights). To slow down learning in the transferred layers, set the initial learning rate to a small value. In the previous step, the learning rate factors were increased for the fully connected layer to speed up learning in the new final layers. This combination of learning rate settings results in fast learning only in the new layers and slower learning in the other layers. When performing transfer learning, you do not need to train for as many epochs. An epoch is a full training cycle on the entire training data set. Specify the mini-batch size to be 11. The software validates the network every ValidationFrequency
iterations during training.
options = trainingOptions('sgdm', ... 'MiniBatchSize',11, ... 'MaxEpochs',5, ... 'InitialLearnRate',2e-4, ... 'Shuffle','every-epoch', ... 'ValidationData',augimdsValidation, ... 'ValidationFrequency',3, ... 'Verbose',false, ... 'Plots','training-progress');
Train the network that consists of the transferred and new layers. By default, trainNetwork
uses a GPU if one is available (requires Parallel Computing Toolbox™ and a supported GPU device. Otherwise, the network uses a CPU (requires MATLAB Coder Interface for Deep learning Libraries™). You can also specify the execution environment by using the 'ExecutionEnvironment'
name-value argument of trainingOptions
.
netTransfer = trainNetwork(augimdsTrain,lgraph,options);
Create dlquantizer Object
Create a quantized network by using the dlquantizer
object. Set the target execution environment to FPGA..
dlQuantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');
Calibrate Quantized Network
Use the calibrate
function to exercise the network by using sample inputs to collect the range information. The calibrate
function exercises the network and collects the dynamic ranges for the learnable parameters of the convolution and fully connected layers of the network.
For best quantization results, the calibration data must be a representative of actual inputs that are predicted by the network.
dlQuantObj.calibrate(augimdsTrain);
Set Up Intel Quartus Prime Standard
Set the synthesis tool path to point to an installed Intel® Quartus® Prime Standard Edition 20.1 executable file. You must have already installed Altera® Quartus II.
% hdlsetuptoolpath('ToolName','Altera Quartus II','ToolPath','C:\intel\20.1\quartus\bin\quartus.exe');
Create Target Object
Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet.
hTarget = dlhdl.Target('Intel','Interface','JTAG');
Generate Bitstream to Run Network
The GoogleNet network consists of multiple Cross Channel Normalization layers. To support this layer on hardware, the 'LRNBlockGeneration' property of the conv module needs to be turned on in the bitstream used for FPGA inference. The shipping arria10soc_int8 bitstream does not have 'LRNBlockGeneration' property turned on. A new bitstream can be generated using the following lines of code. The generated bitstream can be used along with a workflow object for inference.
Update the processor configuration with 'LRNBlockGeneration' property turned on and 'SegmentationBlockGeneration' property turned off. Turn off 'SegmentationBlockGeneration' to fit the Deep Learning IP on the FPGA and avoid overutilization of resources.
% hPC = dlhdl.ProcessorConfig('Bitstream', 'arria10soc_int8'); % hPC.setModuleProperty('conv', 'LRNBlockGeneration', 'on'); % hPC.setModuleProperty('conv', 'SegmentationBlockGeneration', 'off'); % dlhdl.buildProcessor(hPC)
To learn how to use the generated bitstream file, see Generate Custom Bitstream.
Create Workflow Object
Create an object of the dlhdl.Workflow
class. Specify dlQuantObj
as the network. Make sure to use the generated bitstream which enables processing of Cross Channel Normalization layers on FPGA. In this example, the target FPGA board is the Intel Arria10 SOC board and the generated bitstream uses the int8 data type.
hW = dlhdl.Workflow('network', dlQuantObj, 'Bitstream', 'dlprocessor.sof','Target',hTarget);
Compile Workflow Object
To compile the GoogLeNet network, run the compile function of the dlhdl.Workflow
object.
dn = hW.compile
### Compiling network for Deep Learning FPGA prototyping ... ### Targeting FPGA bitstream arria10soc_int8. ### The network includes the following layers: 1 'data' Image Input 224×224×3 images with 'zerocenter' normalization (SW Layer) 2 'conv1-7x7_s2' Convolution 64 7×7×3 convolutions with stride [2 2] and padding [3 3 3 3] (HW Layer) 3 'conv1-relu_7x7' ReLU ReLU (HW Layer) 4 'pool1-3x3_s2' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 1 0 1] (HW Layer) 5 'pool1-norm1' Cross Channel Normalization cross channel normalization with 5 channels per element (HW Layer) 6 'conv2-3x3_reduce' Convolution 64 1×1×64 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 7 'conv2-relu_3x3_reduce' ReLU ReLU (HW Layer) 8 'conv2-3x3' Convolution 192 3×3×64 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 9 'conv2-relu_3x3' ReLU ReLU (HW Layer) 10 'conv2-norm2' Cross Channel Normalization cross channel normalization with 5 channels per element (HW Layer) 11 'pool2-3x3_s2' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 1 0 1] (HW Layer) 12 'inception_3a-1x1' Convolution 64 1×1×192 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 13 'inception_3a-relu_1x1' ReLU ReLU (HW Layer) 14 'inception_3a-3x3_reduce' Convolution 96 1×1×192 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 15 'inception_3a-relu_3x3_reduce' ReLU ReLU (HW Layer) 16 'inception_3a-3x3' Convolution 128 3×3×96 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 17 'inception_3a-relu_3x3' ReLU ReLU (HW Layer) 18 'inception_3a-5x5_reduce' Convolution 16 1×1×192 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 19 'inception_3a-relu_5x5_reduce' ReLU ReLU (HW Layer) 20 'inception_3a-5x5' Convolution 32 5×5×16 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 21 'inception_3a-relu_5x5' ReLU ReLU (HW Layer) 22 'inception_3a-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 23 'inception_3a-pool_proj' Convolution 32 1×1×192 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 24 'inception_3a-relu_pool_proj' ReLU ReLU (HW Layer) 25 'inception_3a-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 26 'inception_3b-1x1' Convolution 128 1×1×256 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 27 'inception_3b-relu_1x1' ReLU ReLU (HW Layer) 28 'inception_3b-3x3_reduce' Convolution 128 1×1×256 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 29 'inception_3b-relu_3x3_reduce' ReLU ReLU (HW Layer) 30 'inception_3b-3x3' Convolution 192 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 31 'inception_3b-relu_3x3' ReLU ReLU (HW Layer) 32 'inception_3b-5x5_reduce' Convolution 32 1×1×256 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 33 'inception_3b-relu_5x5_reduce' ReLU ReLU (HW Layer) 34 'inception_3b-5x5' Convolution 96 5×5×32 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 35 'inception_3b-relu_5x5' ReLU ReLU (HW Layer) 36 'inception_3b-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 37 'inception_3b-pool_proj' Convolution 64 1×1×256 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 38 'inception_3b-relu_pool_proj' ReLU ReLU (HW Layer) 39 'inception_3b-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 40 'pool3-3x3_s2' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 1 0 1] (HW Layer) 41 'inception_4a-1x1' Convolution 192 1×1×480 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 42 'inception_4a-relu_1x1' ReLU ReLU (HW Layer) 43 'inception_4a-3x3_reduce' Convolution 96 1×1×480 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 44 'inception_4a-relu_3x3_reduce' ReLU ReLU (HW Layer) 45 'inception_4a-3x3' Convolution 208 3×3×96 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 46 'inception_4a-relu_3x3' ReLU ReLU (HW Layer) 47 'inception_4a-5x5_reduce' Convolution 16 1×1×480 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 48 'inception_4a-relu_5x5_reduce' ReLU ReLU (HW Layer) 49 'inception_4a-5x5' Convolution 48 5×5×16 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 50 'inception_4a-relu_5x5' ReLU ReLU (HW Layer) 51 'inception_4a-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 52 'inception_4a-pool_proj' Convolution 64 1×1×480 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 53 'inception_4a-relu_pool_proj' ReLU ReLU (HW Layer) 54 'inception_4a-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 55 'inception_4b-1x1' Convolution 160 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 56 'inception_4b-relu_1x1' ReLU ReLU (HW Layer) 57 'inception_4b-3x3_reduce' Convolution 112 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 58 'inception_4b-relu_3x3_reduce' ReLU ReLU (HW Layer) 59 'inception_4b-3x3' Convolution 224 3×3×112 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 60 'inception_4b-relu_3x3' ReLU ReLU (HW Layer) 61 'inception_4b-5x5_reduce' Convolution 24 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 62 'inception_4b-relu_5x5_reduce' ReLU ReLU (HW Layer) 63 'inception_4b-5x5' Convolution 64 5×5×24 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 64 'inception_4b-relu_5x5' ReLU ReLU (HW Layer) 65 'inception_4b-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 66 'inception_4b-pool_proj' Convolution 64 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 67 'inception_4b-relu_pool_proj' ReLU ReLU (HW Layer) 68 'inception_4b-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 69 'inception_4c-1x1' Convolution 128 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 70 'inception_4c-relu_1x1' ReLU ReLU (HW Layer) 71 'inception_4c-3x3_reduce' Convolution 128 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 72 'inception_4c-relu_3x3_reduce' ReLU ReLU (HW Layer) 73 'inception_4c-3x3' Convolution 256 3×3×128 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 74 'inception_4c-relu_3x3' ReLU ReLU (HW Layer) 75 'inception_4c-5x5_reduce' Convolution 24 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 76 'inception_4c-relu_5x5_reduce' ReLU ReLU (HW Layer) 77 'inception_4c-5x5' Convolution 64 5×5×24 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 78 'inception_4c-relu_5x5' ReLU ReLU (HW Layer) 79 'inception_4c-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 80 'inception_4c-pool_proj' Convolution 64 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 81 'inception_4c-relu_pool_proj' ReLU ReLU (HW Layer) 82 'inception_4c-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 83 'inception_4d-1x1' Convolution 112 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 84 'inception_4d-relu_1x1' ReLU ReLU (HW Layer) 85 'inception_4d-3x3_reduce' Convolution 144 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 86 'inception_4d-relu_3x3_reduce' ReLU ReLU (HW Layer) 87 'inception_4d-3x3' Convolution 288 3×3×144 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 88 'inception_4d-relu_3x3' ReLU ReLU (HW Layer) 89 'inception_4d-5x5_reduce' Convolution 32 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 90 'inception_4d-relu_5x5_reduce' ReLU ReLU (HW Layer) 91 'inception_4d-5x5' Convolution 64 5×5×32 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 92 'inception_4d-relu_5x5' ReLU ReLU (HW Layer) 93 'inception_4d-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 94 'inception_4d-pool_proj' Convolution 64 1×1×512 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 95 'inception_4d-relu_pool_proj' ReLU ReLU (HW Layer) 96 'inception_4d-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 97 'inception_4e-1x1' Convolution 256 1×1×528 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 98 'inception_4e-relu_1x1' ReLU ReLU (HW Layer) 99 'inception_4e-3x3_reduce' Convolution 160 1×1×528 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 100 'inception_4e-relu_3x3_reduce' ReLU ReLU (HW Layer) 101 'inception_4e-3x3' Convolution 320 3×3×160 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 102 'inception_4e-relu_3x3' ReLU ReLU (HW Layer) 103 'inception_4e-5x5_reduce' Convolution 32 1×1×528 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 104 'inception_4e-relu_5x5_reduce' ReLU ReLU (HW Layer) 105 'inception_4e-5x5' Convolution 128 5×5×32 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 106 'inception_4e-relu_5x5' ReLU ReLU (HW Layer) 107 'inception_4e-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 108 'inception_4e-pool_proj' Convolution 128 1×1×528 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 109 'inception_4e-relu_pool_proj' ReLU ReLU (HW Layer) 110 'inception_4e-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 111 'pool4-3x3_s2' Max Pooling 3×3 max pooling with stride [2 2] and padding [0 1 0 1] (HW Layer) 112 'inception_5a-1x1' Convolution 256 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 113 'inception_5a-relu_1x1' ReLU ReLU (HW Layer) 114 'inception_5a-3x3_reduce' Convolution 160 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 115 'inception_5a-relu_3x3_reduce' ReLU ReLU (HW Layer) 116 'inception_5a-3x3' Convolution 320 3×3×160 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 117 'inception_5a-relu_3x3' ReLU ReLU (HW Layer) 118 'inception_5a-5x5_reduce' Convolution 32 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 119 'inception_5a-relu_5x5_reduce' ReLU ReLU (HW Layer) 120 'inception_5a-5x5' Convolution 128 5×5×32 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 121 'inception_5a-relu_5x5' ReLU ReLU (HW Layer) 122 'inception_5a-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 123 'inception_5a-pool_proj' Convolution 128 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 124 'inception_5a-relu_pool_proj' ReLU ReLU (HW Layer) 125 'inception_5a-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 126 'inception_5b-1x1' Convolution 384 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 127 'inception_5b-relu_1x1' ReLU ReLU (HW Layer) 128 'inception_5b-3x3_reduce' Convolution 192 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 129 'inception_5b-relu_3x3_reduce' ReLU ReLU (HW Layer) 130 'inception_5b-3x3' Convolution 384 3×3×192 convolutions with stride [1 1] and padding [1 1 1 1] (HW Layer) 131 'inception_5b-relu_3x3' ReLU ReLU (HW Layer) 132 'inception_5b-5x5_reduce' Convolution 48 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 133 'inception_5b-relu_5x5_reduce' ReLU ReLU (HW Layer) 134 'inception_5b-5x5' Convolution 128 5×5×48 convolutions with stride [1 1] and padding [2 2 2 2] (HW Layer) 135 'inception_5b-relu_5x5' ReLU ReLU (HW Layer) 136 'inception_5b-pool' Max Pooling 3×3 max pooling with stride [1 1] and padding [1 1 1 1] (HW Layer) 137 'inception_5b-pool_proj' Convolution 128 1×1×832 convolutions with stride [1 1] and padding [0 0 0 0] (HW Layer) 138 'inception_5b-relu_pool_proj' ReLU ReLU (HW Layer) 139 'inception_5b-output' Depth concatenation Depth concatenation of 4 inputs (HW Layer) 140 'pool5-7x7_s1' 2-D Global Average Pooling 2-D global average pooling (HW Layer) 141 'pool5-drop_7x7_s1' Dropout 40% dropout (HW Layer) 142 'newFC' Fully Connected 5 fully connected layer (HW Layer) 143 'newProb' Softmax softmax (HW Layer) 144 'newClassOutput' Classification Output crossentropyex with 'MathWorks Cap' and 4 other classes (SW Layer) ### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software. ### Notice: The layer 'newClassOutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software. ### Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ... ### Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ... complete. ### Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ... ### Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ... complete. ### Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ... ### Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ... complete. ### Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ... ### Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ... complete. ### Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ... ### Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ... complete. ### Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ... ### Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ... complete. ### Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ... ### Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ... complete. ### Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ... ### Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ... complete. ### Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ... ### Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ... complete. ### Compiling layer group: pool3-3x3_s2 ... ### Compiling layer group: pool3-3x3_s2 ... complete. ### Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ... ### Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ... complete. ### Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ... ### Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ... complete. ### Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ... ### Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ... complete. ### Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ... ### Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ... complete. ### Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ... ### Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ... complete. ### Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ... ### Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ... complete. ### Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ... ### Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ... complete. ### Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ... ### Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ... complete. ### Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ... ### Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ... complete. ### Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ... ### Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ... complete. ### Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ... ### Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ... complete. ### Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ... ### Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ... complete. ### Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ... ### Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ... complete. ### Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ... ### Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ... complete. ### Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ... ### Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ... complete. ### Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ... ### Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ... complete. ### Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ... ### Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ... complete. ### Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ... ### Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ... complete. ### Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ... ### Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ... complete. ### Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ... ### Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ... complete. ### Compiling layer group: pool4-3x3_s2 ... ### Compiling layer group: pool4-3x3_s2 ... complete. ### Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ... ### Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ... complete. ### Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ... ### Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ... complete. ### Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ... ### Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ... complete. ### Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ... ### Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ... complete. ### Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ... ### Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ... complete. ### Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ... ### Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ... complete. ### Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ... ### Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ... complete. ### Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ... ### Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ... complete. ### Compiling layer group: pool5-7x7_s1 ... ### Compiling layer group: pool5-7x7_s1 ... complete. ### Compiling layer group: newFC ... ### Compiling layer group: newFC ... complete. ### Allocating external memory buffers: offset_name offset_address allocated_space _______________________ ______________ ________________ "InputDataOffset" "0x00000000" "12.0 MB" "OutputResultOffset" "0x00c00000" "4.0 MB" "SchedulerDataOffset" "0x01000000" "4.0 MB" "SystemBufferOffset" "0x01400000" "28.0 MB" "InstructionDataOffset" "0x03000000" "8.0 MB" "ConvWeightDataOffset" "0x03800000" "32.0 MB" "FCWeightDataOffset" "0x05800000" "4.0 MB" "EndOffset" "0x05c00000" "Total: 92.0 MB" ### Network compilation complete.
dn = struct with fields:
weights: [1×1 struct]
instructions: [1×1 struct]
registers: [1×1 struct]
syncInstructions: [1×1 struct]
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Intel Arria10 SoC hardware, run the deploy function of the dlhdl.Workflow
object. This function uses the output of the compile function to program the FPGA board by using the programming file. The function also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.
hW.deploy
### Programming FPGA Bitstream using JTAG... ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 11-Jun-2021 22:20:12 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 11-Jun-2021 22:20:12
Load Example Image
I = imresize(readimage(imdsValidation,1),[224 224]); figure imshow(I)
Retrieve Image Prediction
Execute the predict function of the dlhdl.Workflow object
and display the prediction results.
[prediction, speed] = hW.predict(single(I),'Profile','off');
### Finished writing input activations. ### Running single input activation.
[val, index] = max(prediction); label = netTransfer.Layers(end).ClassNames{index}
label = 'MathWorks Cap'
title(string(label));
Retrieve Deployed Network Performance
View the performance of the deployed network by using the predict
method with the Profile
argument set to on
.
[~, speed] = hW.predict(single(I),'Profile','on')
### Finished writing input activations. ### Running single input activation. Deep Learning Processor Profiler Performance Results LastFrameLatency(cycles) LastFrameLatency(seconds) FramesNum Total Latency Frames/s ------------- ------------- --------- --------- --------- Network 15836394 0.10558 1 15845325 9.5 conv1-7x7_s2 1139964 0.00760 pool1-3x3_s2 268928 0.00179 pool1-norm1 310985 0.00207 conv2-3x3_reduce 278740 0.00186 conv2-3x3 823735 0.00549 conv2-norm2 952105 0.00635 pool2-3x3_s2 273479 0.00182 inception_3a-1x1 198078 0.00132 inception_3a-3x3_reduce 280845 0.00187 inception_3a-3x3 196410 0.00131 inception_3a-5x5_reduce 73846 0.00049 inception_3a-5x5 35295 0.00024 inception_3a-pool 94554 0.00063 inception_3a-pool_proj 115223 0.00077 inception_3b-1x1 619945 0.00413 inception_3b-3x3_reduce 620509 0.00414 inception_3b-3x3 367297 0.00245 inception_3b-5x5_reduce 207909 0.00139 inception_3b-5x5 178552 0.00119 inception_3b-pool 179959 0.00120 inception_3b-pool_proj 344959 0.00230 pool3-3x3_s2 293640 0.00196 inception_4a-1x1 332992 0.00222 inception_4a-3x3_reduce 181829 0.00121 inception_4a-3x3 83777 0.00056 inception_4a-5x5_reduce 55639 0.00037 inception_4a-5x5 14500 0.00010 inception_4a-pool 77187 0.00051 inception_4a-pool_proj 130965 0.00087 inception_4b-1x1 300254 0.00200 inception_4b-3x3_reduce 220515 0.00147 inception_4b-3x3 101764 0.00068 inception_4b-5x5_reduce 73096 0.00049 inception_4b-5x5 25720 0.00017 inception_4b-pool 82277 0.00055 inception_4b-pool_proj 139530 0.00093 inception_4c-1x1 246715 0.00164 inception_4c-3x3_reduce 246987 0.00165 inception_4c-3x3 129291 0.00086 inception_4c-5x5_reduce 72855 0.00049 inception_4c-5x5 25444 0.00017 inception_4c-pool 82661 0.00055 inception_4c-pool_proj 139761 0.00093 inception_4d-1x1 220154 0.00147 inception_4d-3x3_reduce 273136 0.00182 inception_4d-3x3 159811 0.00107 inception_4d-5x5_reduce 86719 0.00058 inception_4d-5x5 32485 0.00022 inception_4d-pool 82309 0.00055 inception_4d-pool_proj 139464 0.00093 inception_4e-1x1 474515 0.00316 inception_4e-3x3_reduce 309661 0.00206 inception_4e-3x3 193442 0.00129 inception_4e-5x5_reduce 88661 0.00059 inception_4e-5x5 62881 0.00042 inception_4e-pool 85098 0.00057 inception_4e-pool_proj 254234 0.00169 pool4-3x3_s2 164072 0.00109 inception_5a-1x1 385821 0.00257 inception_5a-3x3_reduce 250827 0.00167 inception_5a-3x3 99439 0.00066 inception_5a-5x5_reduce 69697 0.00046 inception_5a-5x5 32465 0.00022 inception_5a-pool 53624 0.00036 inception_5a-pool_proj 205084 0.00137 inception_5b-1x1 567107 0.00378 inception_5b-3x3_reduce 295819 0.00197 inception_5b-3x3 139308 0.00093 inception_5b-5x5_reduce 92415 0.00062 inception_5b-5x5 46311 0.00031 inception_5b-pool 53882 0.00036 inception_5b-pool_proj 205632 0.00137 pool5-7x7_s1 69837 0.00047 newFC 23215 0.00015 * The clock frequency of the DL processor is: 150MHz
speed=75×5 table
Latency(cycles) Latency(seconds) NumFrames Total Latency(cycles) Frame/s
_______________ ________________ _________ _____________________ ________
Network 1.5836e+07 0.10558 "1" "15845325" "9.4665"
____conv1-7x7_s2 1.14e+06 0.0075998 "" "" ""
____pool1-3x3_s2 2.6893e+05 0.0017929 "" "" ""
____pool1-norm1 3.1098e+05 0.0020732 "" "" ""
____conv2-3x3_reduce 2.7874e+05 0.0018583 "" "" ""
____conv2-3x3 8.2374e+05 0.0054916 "" "" ""
____conv2-norm2 9.521e+05 0.0063474 "" "" ""
____pool2-3x3_s2 2.7348e+05 0.0018232 "" "" ""
____inception_3a-1x1 1.9808e+05 0.0013205 "" "" ""
____inception_3a-3x3_reduce 2.8084e+05 0.0018723 "" "" ""
____inception_3a-3x3 1.9641e+05 0.0013094 "" "" ""
____inception_3a-5x5_reduce 73846 0.00049231 "" "" ""
____inception_3a-5x5 35295 0.0002353 "" "" ""
____inception_3a-pool 94554 0.00063036 "" "" ""
____inception_3a-pool_proj 1.1522e+05 0.00076815 "" "" ""
____inception_3b-1x1 6.1994e+05 0.004133 "" "" ""
⋮
The speed
table contains the latency information for every layer, total network latency, and the overall network performance in frames per second (FPS). For more information, see Profile Inference Run.
See Also
dlhdl.Workflow
| dlhdl.Target
| compile
| deploy
| predict
| dlquantizer
| dlquantizationOptions
| calibrate
| validate