使用 GPU Coder 优化车道检测
此示例说明如何开发在 NVIDIA® GPU 上运行的深度学习车道检测应用。
预训练的车道检测网络可以从图像中检测并输出车道标记边界,并且基于 AlexNet
网络。AlexNet
网络的最后几层被更小的全连接层和回归输出层取代。该示例生成一个 CUDA 可执行文件,它在主机上支持 CUDA 的 GPU 上运行。
前提条件
支持 CUDA 的 NVIDIA GPU。
NVIDIA CUDA 工具包和驱动程序。
NVIDIA cuDNN 库。
编译器和库的环境变量。有关支持的编译器和库的版本的信息,请参阅Third-Party Hardware (GPU Coder)。有关设置环境变量的信息,请参阅Setting Up the Prerequisite Products (GPU Coder)。
验证 GPU 环境
使用 coder.checkGpuInstall
(GPU Coder) 函数验证运行此示例所需的编译器和库是否已正确设置。
envCfg = coder.gpuEnvConfig('host'); envCfg.DeepLibTarget = 'cudnn'; envCfg.DeepCodegen = 1; envCfg.Quiet = 1; coder.checkGpuInstall(envCfg);
获取预训练的车道检测网络
此示例使用包含预训练的车道检测网络的 trainedLaneNet
MAT 文件。此文件大小约为 143 MB。从 MathWorks 网站下载该文件。
laneNetFile = matlab.internal.examples.downloadSupportFile('gpucoder/cnn_models/lane_detection', ... 'trainedLaneNet.mat');
该网络将图像作为输入并输出两个车道边界,分别对应于自我意识车辆的左右车道。每个车道边界都由抛物线方程 表示,其中 y 是横向偏移,x 是与车辆的纵向距离。该网络为每个车道输出三个参数 a、b 和 c。网络架构类似于 AlexNet
,但是最后几层会替换为较小的全连接层和回归输出层。
load(laneNetFile); disp(laneNet)
SeriesNetwork with properties: Layers: [23×1 nnet.cnn.layer.Layer] InputNames: {'data'} OutputNames: {'output'}
要查看网络架构,请使用 analyzeNetwork
函数。
analyzeNetwork(laneNet)
下载测试视频
为了测试该模型,该示例使用来自加州理工学院车道数据集的视频文件。该文件的大小约为 8 MB。从 MathWorks 网站下载该文件。
videoFile = matlab.internal.examples.downloadSupportFile('gpucoder/media','caltech_cordova1.avi');
主要入口函数
detectLanesInVideo.m
文件是代码生成的主要入口函数。detectLanesInVideo
函数使用 vision.VideoFileReader
(Computer Vision Toolbox) System object 从输入视频中读取帧,调用 LaneNet 网络对象的预测方法,并绘制输入视频中检测到的车道。vision.DeployableVideoPlayer
(Computer Vision Toolbox) System object 用于显示检测到车道的视频输出。
type detectLanesInVideo.m
function detectLanesInVideo(videoFile,net,laneCoeffMeans,laneCoeffsStds) % detectLanesInVideo Entry-point function for the Lane Detection Optimized % with GPU Coder example % % detectLanesInVideo(videoFile,net,laneCoeffMeans,laneCoeffsStds) uses the % VideoFileReader system object to read frames from the input video, calls % the predict method of the LaneNet network object, and draws the detected % lanes on the input video. A DeployableVideoPlayer system object is used % to display the lane detected video output. % Copyright 2022 The MathWorks, Inc. %#codegen %% Create Video Reader and Video Player Object videoFReader = vision.VideoFileReader(videoFile); depVideoPlayer = vision.DeployableVideoPlayer(Name='Lane Detection on GPU'); %% Video Frame Processing Loop while ~isDone(videoFReader) videoFrame = videoFReader(); scaledFrame = 255.*(imresize(videoFrame,[227 227])); [laneFound,ltPts,rtPts] = laneNetPredict(net,scaledFrame, ... laneCoeffMeans,laneCoeffsStds); if(laneFound) pts = [reshape(ltPts',1,[]);reshape(rtPts',1,[])]; videoFrame = insertShape(videoFrame, 'Line', pts, 'LineWidth', 4); end depVideoPlayer(videoFrame); end end
LaneNet 预测函数
laneNetPredict
函数计算左右车道在单个视频帧中的位置。laneNet
网络会计算参数 a、b 和 c,这些参数描述了左右车道边界的抛物线方程。根据这些参数,计算与车道位置对应的 x 和 y 坐标。这些坐标必须映射到图像坐标。
type laneNetPredict.m
function [laneFound,ltPts,rtPts] = laneNetPredict(net,frame,means,stds) % laneNetPredict Predict lane markers on the input image frame using the % lane detection network % % Copyright 2017-2022 The MathWorks, Inc. %#codegen % A persistent object lanenet is used to load the network object. At the % first call to this function, the persistent object is constructed and % setup. When the function is called subsequent times, the same object is % reused to call predict on inputs, thus avoiding reconstructing and % reloading the network object. persistent lanenet; if isempty(lanenet) lanenet = coder.loadDeepLearningNetwork(net, 'lanenet'); end lanecoeffsNetworkOutput = predict(lanenet,frame); % Recover original coeffs by reversing the normalization steps. params = lanecoeffsNetworkOutput .* stds + means; % 'c' should be more than 0.5 for it to be a lane. isRightLaneFound = abs(params(6)) > 0.5; isLeftLaneFound = abs(params(3)) > 0.5; % From the networks output, compute left and right lane points in the image % coordinates. vehicleXPoints = 3:30; ltPts = coder.nullcopy(zeros(28,2,'single')); rtPts = coder.nullcopy(zeros(28,2,'single')); if isRightLaneFound && isLeftLaneFound rtBoundary = params(4:6); rt_y = computeBoundaryModel(rtBoundary, vehicleXPoints); ltBoundary = params(1:3); lt_y = computeBoundaryModel(ltBoundary, vehicleXPoints); % Visualize lane boundaries of the ego vehicle. tform = get_tformToImage; % Map vehicle to image coordinates. ltPts = tform.transformPointsInverse([vehicleXPoints', lt_y']); rtPts = tform.transformPointsInverse([vehicleXPoints', rt_y']); laneFound = true; else laneFound = false; end end %% Helper Functions % Compute boundary model. function yWorld = computeBoundaryModel(model, xWorld) yWorld = polyval(model, xWorld); end % Compute extrinsics. function tform = get_tformToImage %The camera coordinates are described by the caltech mono % camera model. yaw = 0; pitch = 14; % Pitch of the camera in degrees roll = 0; translation = translationVector(yaw, pitch, roll); rotation = rotationMatrix(yaw, pitch, roll); % Construct a camera matrix. focalLength = [309.4362, 344.2161]; principalPoint = [318.9034, 257.5352]; Skew = 0; camMatrix = [rotation; translation] * intrinsicMatrix(focalLength, ... Skew, principalPoint); % Turn camMatrix into 2-D homography. tform2D = [camMatrix(1,:); camMatrix(2,:); camMatrix(4,:)]; % drop Z tform = projective2d(tform2D); tform = tform.invert(); end % Translate to image co-ordinates. function translation = translationVector(yaw, pitch, roll) SensorLocation = [0 0]; Height = 2.1798; % mounting height in meters from the ground rotationMatrix = (... rotZ(yaw)*... % last rotation rotX(90-pitch)*... rotZ(roll)... % first rotation ); % Adjust for the SensorLocation by adding a translation. sl = SensorLocation; translationInWorldUnits = [sl(2), sl(1), Height]; translation = translationInWorldUnits*rotationMatrix; end % Rotation around X-axis. function R = rotX(a) a = deg2rad(a); R = [... 1 0 0; 0 cos(a) -sin(a); 0 sin(a) cos(a)]; end % Rotation around Y-axis. function R = rotY(a) a = deg2rad(a); R = [... cos(a) 0 sin(a); 0 1 0; -sin(a) 0 cos(a)]; end % Rotation around Z-axis. function R = rotZ(a) a = deg2rad(a); R = [... cos(a) -sin(a) 0; sin(a) cos(a) 0; 0 0 1]; end % Given the Yaw, Pitch, and Roll, determine the appropriate Euler angles % and the sequence in which they are applied to align the camera's % coordinate system with the vehicle coordinate system. The resulting % matrix is a Rotation matrix that together with the Translation vector % defines the extrinsic parameters of the camera. function rotation = rotationMatrix(yaw, pitch, roll) rotation = (... rotY(180)*... % last rotation: point Z up rotZ(-90)*... % X-Y swap rotZ(yaw)*... % point the camera forward rotX(90-pitch)*... % "un-pitch" rotZ(roll)... % 1st rotation: "un-roll" ); end % Intrinsic matrix computation. function intrinsicMat = intrinsicMatrix(FocalLength, Skew, PrincipalPoint) intrinsicMat = ... [FocalLength(1) , 0 , 0; ... Skew , FocalLength(2) , 0; ... PrincipalPoint(1), PrincipalPoint(2), 1]; end
生成 CUDA 可执行文件
要为 detectLanesInVideo
入口函数生成独立的 CUDA 可执行文件,请为 'exe'
目标创建一个 GPU 代码配置对象,并将目标语言设置为 C++。使用 coder.DeepLearningConfig
(GPU Coder) 函数创建一个 CuDNN
深度学习配置对象,并将其赋给 GPU 代码配置对象的 DeepLearningConfig
属性。
cfg = coder.gpuConfig('exe'); cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn'); cfg.GenerateReport = true; cfg.GenerateExampleMain = "GenerateCodeAndCompile"; cfg.TargetLang = 'C++'; inputs = {coder.Constant(videoFile),coder.Constant(laneNetFile), ... coder.Constant(laneCoeffMeans),coder.Constant(laneCoeffsStds)};
运行 codegen
命令。
codegen -args inputs -config cfg detectLanesInVideo
Code generation successful: View report
生成的代码说明
串行网络生成为一个 C++ 类,其中包含由 18 个层类组成的数组(在层融合优化后)。该类的 setup()
方法会设置句柄并为每个层对象分配内存。predict()
方法会针对网络中 18 个层的每个层调用预测。
class lanenet0_0 { public: lanenet0_0(); void setSize(); void resetState(); void setup(); void predict(); void cleanup(); float *getLayerOutput(int layerIndex, int portIndex); int getLayerOutputSize(int layerIndex, int portIndex); float *getInputDataPointer(int b_index); float *getInputDataPointer(); float *getOutputDataPointer(int b_index); float *getOutputDataPointer(); int getBatchSize(); ~lanenet0_0(); private: void allocate(); void postsetup(); void deallocate(); public: boolean_T isInitialized; boolean_T matlabCodegenIsDeleted; private: int numLayers; MWTensorBase *inputTensors[1]; MWTensorBase *outputTensors[1]; MWCNNLayer *layers[18]; MWCudnnTarget::MWTargetNetworkImpl *targetImpl; };
cnn_lanenet*_conv*_w 和 cnn_lanenet*_conv*_b 文件是网络中卷积层的二进制权重和偏置文件。cnn_lanenet*_fc*_w 和 cnn_lanenet*_fc*_b 文件是网络中全连接层的二进制权重和偏置文件。
codegendir = fullfile('codegen', 'exe', 'detectLanesInVideo'); dir([codegendir,filesep,'*.bin'])
cnn_lanenet0_0_conv1_b.bin cnn_lanenet0_0_conv3_b.bin cnn_lanenet0_0_conv5_b.bin cnn_lanenet0_0_fc6_b.bin cnn_lanenet0_0_fcLane2_b.bin cnn_lanenet0_0_conv1_w.bin cnn_lanenet0_0_conv3_w.bin cnn_lanenet0_0_conv5_w.bin cnn_lanenet0_0_fc6_w.bin cnn_lanenet0_0_fcLane2_w.bin cnn_lanenet0_0_conv2_b.bin cnn_lanenet0_0_conv4_b.bin cnn_lanenet0_0_data_offset.bin cnn_lanenet0_0_fcLane1_b.bin networkParamsInfo_lanenet0_0.bin cnn_lanenet0_0_conv2_w.bin cnn_lanenet0_0_conv4_w.bin cnn_lanenet0_0_data_scale.bin cnn_lanenet0_0_fcLane1_w.bin
运行可执行文件
要运行可执行文件,请取消注释以下代码行。
if ispc [status,cmdout] = system("detectLanesInVideo.exe"); else [status,cmdout] = system("./detectLanesInVideo"); end