Accelerate Vehicle Detection with SIMD
This example shows how to perform automatic detection and tracking of vehicles. You can generate SIMD code using Intel™ AVX2 technology to increase the number of frames per second in the video. A higher frame rate improves the quality and speed of the detection and tracking system.
Detect Vehicles Using ACF Vehicle Detector
Create an acfObjectDetector
(Computer Vision Toolbox) object for detecting vehicles.
detector = vehicleDetectorACF('full-view');
To support code generation, the vehicle detector object must be in the form of a structure. Use the toStruct
(Computer Vision Toolbox) function to create a structure that stores the properties of the input vehicle detector object in the Classifier
and TraininginOptions
fields.
sModel = toStruct(detector);
Save the structure and a detection threshold value specified as detectionThresh
to a .mat
file.
detectionThresh = 17; save('model.mat','sModel', 'detectionThresh');
Examine vehicleDetection
Entry-Point Function
The vehicleDetection.m
file is the main entry-point function for code generation. The vehicleDetection
function loads the model.mat
file that you just created and recreates an acfObjectDetector
object to detect vehicles within the input video. The input video file is from the Caltech lanes data set and ships with Automated Driving Toolbox™.
The vehicleDetection
function uses the vision.VideoFileReader
(Computer Vision Toolbox) system object to read frames from the input video and the vision.DeployableVideoPlayer
(Computer Vision Toolbox) system object to display the vehicle detection video output. The function draws boxes around detected vehicles and displays the frame rate in the corner of the output video and in the MATLAB™ command window.
type vehicleDetection.m
function vehicleDetection() model = coder.load('model.mat'); thresh = model.detectionThresh; detector = acfObjectDetector(model.sModel.Classifier,model.sModel.TrainingOptions); % Set up system objects to read a video file videoFReader = vision.VideoFileReader('caltech_cordova1.avi'); % Use deployable video player to show result depVideoPlayer = vision.DeployableVideoPlayer; totalTime=0; nFrames=0; maxNumBoxes=30; % Continue to read frames of video until the last frame is read while ~isDone(videoFReader) boundedBoxes=zeros(maxNumBoxes,4,'int32'); videoFrame = videoFReader(); tic; [boxes_raw, scores] = detect(detector,videoFrame); time = toc; % Filter out low confident detections boxes = boxes_raw(scores>thresh, :); % Draw boxes around detected vehicles in frame nBoxes=size(boxes,1); if(nBoxes>0 && nBoxes<=maxNumBoxes) boundedBoxes(1:nBoxes,1:4)= int32(boxes(1:nBoxes,1:4)); videoFrame = insertShape(videoFrame,'Rectangle',int32(boundedBoxes)); end totalTime=totalTime+time; nFrames=nFrames+1; % Print frames per second to the frame corner frameRate = nFrames/totalTime; videoFrame = insertText(videoFrame, [20 20], sprintf('%0.2f FPS', frameRate), 'AnchorPoint', 'LeftBottom'); depVideoPlayer(videoFrame); end % Release system objects release(videoFReader); release(depVideoPlayer); % Print frames per second to command window avgTime = totalTime/nFrames; fprintf('Average time = %g \n', avgTime); fprintf('Average frame rate = %g \n', 1/avgTime);
Configure Code Generation Configuration Object
To generate a standalone executable for the detectandTrack
entry-point function, use the coder.config
function to create a coder.EmbeddedCodeConfig
object for an exe
target. This object contains the configuration parameters that the codegen
function uses for generating an executable program with Embedded Coder™.
ecfg = coder.config('exe');
Specify an example main C function that the code generator compiles to create a test executable.
ecfg.GenerateExampleMain = 'GenerateCodeAndCompile';
Optimize the build for faster running executables.
ecfg.BuildConfiguration = 'Faster Runs';
Specify that the code generator does not produce code to handle integer overflow and produces code to support nonfinite values (Inf
and Nan
) only if they are used.
ecfg.SaturateOnIntegerOverflow = false; ecfg.SupportNonFinite = true;
Allocate memory dynamically on the heap for variable-size arrays whose size (in bytes) is greater than or equal to the value of the DynamicMemoryAllocationThreshold
parameters.
ecfg.DynamicMemoryAllocation = 'Threshold';
ecfg.DynamicMemoryAllocationThreshold = 2e8;
Because this example generates code from Automated Driving Toolbox™ and Computer Vision Toolbox™ functions, the generated code must be portable and not rely on third party libraries. To generate portable code that you can retarget for an Intel device, create a coder.HardwareImplementation
object and specify a nonhost target. Then, configure the production hardware settings to match those of an Intel device.
ecfg.HardwareImplementation.ProdHWDeviceType = "Generic->Custom"; ecfg.HardwareImplementation.ProdBitPerLong = 64; ecfg.HardwareImplementation.ProdBitPerPointer = 64; ecfg.HardwareImplementation.ProdBitPerPtrDiffT = 64; ecfg.HardwareImplementation.ProdBitPerSizeT = 64; ecfg.HardwareImplementation.ProdEndianess = "LittleEndian"; ecfg.HardwareImplementation.ProdIntDivRoundTo = "Zero"; ecfg.HardwareImplementation.ProdLargestAtomicFloat = "Float"; ecfg.HardwareImplementation.ProdWordSize = 64;
For some Image Processing Toolbox™ functions, the code generator uses the OpenMP application interface to support shared-memory, multicore code generation. To achieve the highest frame rate and avoid inefficiencies due to the processor trying to use too many threads, consider specifying a maximum number of threads to run parallel for
-loops in the generated code. To do so, set the OpenMP environment variable, OMP_NUM_THREADS
, to a number less than or equal to the number of cores in your processor. For more information, see OpenMP Specifications. This example sets this variable to 4.
setenv('OMP_NUM_THREADS','4')
Generate Non-SIMD Code
evalc('codegen -config ecfg vehicleDetection.m');
Run the executable and observe the frame rate at the top left of the video and in the command window. This example runs the executable on Windows. To run the executable on Linux, change the command to !./detectAndTrack
.
!vehicleDetection.exe
Average time = 0.0763511 Average frame rate = 13.0974
Generate SIMD Code
Configure the code generation configuration object to generate SIMD code using AVX2 technology.
ecfg.InstructionSetExtensions = "AVX2";
Generate code.
evalc('codegen -config ecfg vehicleDetection.m');
Run the executable and observe the higher frame rate at the top left of the video and in the command window.
!vehicleDetection.exe
Average time = 0.0632882 Average frame rate = 15.8007