本页面提供的是上一版软件的文档。当前版本中已删除对应的英文页面。

交通标志检测和识别

此示例说明如何为使用深度学习的交通标志检测和识别应用程序生成 CUDA® MEX 代码。交通标志检测和识别是驾驶辅助系统的重要应用,可辅助并向驾驶员提供有关道路标志的信息。

在此交通标志检测和识别示例中,执行三个步骤 - 检测、非极大值抑制 (NMS) 和识别。首先,该示例使用 You Only Look Once (YOLO) 网络的目标检测网络变体检测输入图像上的交通标志。然后,使用 NMS 算法抑制重叠检测。最后,识别网络对检测到的交通标志进行分类。

前提条件

  • 支持 CUDA 的 NVIDIA® GPU,计算能力为 3.2 或更高。

  • NVIDIA CUDA 工具包和驱动程序。

  • NVIDIA cuDNN 库。

  • 编译器和库的环境变量。有关支持的编译器和库的版本的信息,请参阅Third-party Products (GPU Coder)。有关设置环境变量的信息,请参阅Setting Up the Prerequisite Products (GPU Coder)。

  • GPU Coder Interface for Deep Learning Libraries 支持包。要安装此支持包,请使用附加功能资源管理器

验证 GPU 环境

使用 coder.checkGpuInstall 函数验证运行此示例所需的编译器和库是否已正确设置。

envCfg = coder.gpuEnvConfig('host');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 1;
coder.checkGpuInstall(envCfg);

检测和识别网络

检测网络在 Darknet 框架中进行训练,并被导入到 MATLAB® 中进行推断。由于交通标志相对于图像而言较小,并且训练数据中每个类的训练样本数量较少,因此在训练检测网络时,所有交通标志都被视为单个类。

检测网络将输入图像划分为一个 7×7 网格。每个网格单元都会检测到一个交通标志,前提是该交通标志的中心在该网格单元内。每个单元预测两个边界框以及这两个边界框的置信度分数。置信度分数指示该框是否包含目标。每个单元预测在栅格单元内找到交通标志的概率。最终分数是先前分数的乘积。对此最终得分应用 0.2 的阈值来选择检测值。

使用 MATLAB 基于相同的图像训练识别网络。

trainRecognitionnet.m 辅助脚本显示识别网络训练。

获得预训练的 SeriesNetwork

下载检测网络和识别网络。

getTsdr();

检测网络包含 58 个层,包括卷积层、泄漏 ReLU 层和全连接层。

load('yolo_tsr.mat');
yolo.Layers
ans = 

  58x1 Layer array with layers:

     1   'input'         Image Input             448x448x3 images
     2   'conv1'         Convolution             64 7x7x3 convolutions with stride [2  2] and padding [3  3  3  3]
     3   'relu1'         Leaky ReLU              Leaky ReLU with scale 0.1
     4   'pool1'         Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv2'         Convolution             192 3x3x64 convolutions with stride [1  1] and padding [1  1  1  1]
     6   'relu2'         Leaky ReLU              Leaky ReLU with scale 0.1
     7   'pool2'         Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'conv3'         Convolution             128 1x1x192 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'relu3'         Leaky ReLU              Leaky ReLU with scale 0.1
    10   'conv4'         Convolution             256 3x3x128 convolutions with stride [1  1] and padding [1  1  1  1]
    11   'relu4'         Leaky ReLU              Leaky ReLU with scale 0.1
    12   'conv5'         Convolution             256 1x1x256 convolutions with stride [1  1] and padding [0  0  0  0]
    13   'relu5'         Leaky ReLU              Leaky ReLU with scale 0.1
    14   'conv6'         Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    15   'relu6'         Leaky ReLU              Leaky ReLU with scale 0.1
    16   'pool6'         Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    17   'conv7'         Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    18   'relu7'         Leaky ReLU              Leaky ReLU with scale 0.1
    19   'conv8'         Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    20   'relu8'         Leaky ReLU              Leaky ReLU with scale 0.1
    21   'conv9'         Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    22   'relu9'         Leaky ReLU              Leaky ReLU with scale 0.1
    23   'conv10'        Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    24   'relu10'        Leaky ReLU              Leaky ReLU with scale 0.1
    25   'conv11'        Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    26   'relu11'        Leaky ReLU              Leaky ReLU with scale 0.1
    27   'conv12'        Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    28   'relu12'        Leaky ReLU              Leaky ReLU with scale 0.1
    29   'conv13'        Convolution             256 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    30   'relu13'        Leaky ReLU              Leaky ReLU with scale 0.1
    31   'conv14'        Convolution             512 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    32   'relu14'        Leaky ReLU              Leaky ReLU with scale 0.1
    33   'conv15'        Convolution             512 1x1x512 convolutions with stride [1  1] and padding [0  0  0  0]
    34   'relu15'        Leaky ReLU              Leaky ReLU with scale 0.1
    35   'conv16'        Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    36   'relu16'        Leaky ReLU              Leaky ReLU with scale 0.1
    37   'pool16'        Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    38   'conv17'        Convolution             512 1x1x1024 convolutions with stride [1  1] and padding [0  0  0  0]
    39   'relu17'        Leaky ReLU              Leaky ReLU with scale 0.1
    40   'conv18'        Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    41   'relu18'        Leaky ReLU              Leaky ReLU with scale 0.1
    42   'conv19'        Convolution             512 1x1x1024 convolutions with stride [1  1] and padding [0  0  0  0]
    43   'relu19'        Leaky ReLU              Leaky ReLU with scale 0.1
    44   'conv20'        Convolution             1024 3x3x512 convolutions with stride [1  1] and padding [1  1  1  1]
    45   'relu20'        Leaky ReLU              Leaky ReLU with scale 0.1
    46   'conv21'        Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    47   'relu21'        Leaky ReLU              Leaky ReLU with scale 0.1
    48   'conv22'        Convolution             1024 3x3x1024 convolutions with stride [2  2] and padding [1  1  1  1]
    49   'relu22'        Leaky ReLU              Leaky ReLU with scale 0.1
    50   'conv23'        Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    51   'relu23'        Leaky ReLU              Leaky ReLU with scale 0.1
    52   'conv24'        Convolution             1024 3x3x1024 convolutions with stride [1  1] and padding [1  1  1  1]
    53   'relu24'        Leaky ReLU              Leaky ReLU with scale 0.1
    54   'fc25'          Fully Connected         4096 fully connected layer
    55   'relu25'        Leaky ReLU              Leaky ReLU with scale 0.1
    56   'fc26'          Fully Connected         539 fully connected layer
    57   'softmax'       Softmax                 softmax
    58   'classoutput'   Classification Output   crossentropyex

识别网络包含 14 个层,包括卷积层、全连接层和分类输出层。

load('RecognitionNet.mat');
convnet.Layers
ans = 

  14x1 Layer array with layers:

     1   'imageinput'    Image Input             48x48x3 images with 'zerocenter' normalization and 'randfliplr' augmentations
     2   'conv_1'        Convolution             100 7x7x3 convolutions with stride [1  1] and padding [0  0  0  0]
     3   'relu_1'        ReLU                    ReLU
     4   'maxpool_1'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     5   'conv_2'        Convolution             150 4x4x100 convolutions with stride [1  1] and padding [0  0  0  0]
     6   'relu_2'        ReLU                    ReLU
     7   'maxpool_2'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
     8   'conv_3'        Convolution             250 4x4x150 convolutions with stride [1  1] and padding [0  0  0  0]
     9   'maxpool_3'     Max Pooling             2x2 max pooling with stride [2  2] and padding [0  0  0  0]
    10   'fc_1'          Fully Connected         300 fully connected layer
    11   'dropout'       Dropout                 90% dropout
    12   'fc_2'          Fully Connected         35 fully connected layer
    13   'softmax'       Softmax                 softmax
    14   'classoutput'   Classification Output   crossentropyex with '0' and 34 other classes

tsdr_predict 入口函数

tsdr_predict.m 入口函数以图像作为输入,并使用检测网络检测图像中的交通标志。该函数使用 selectStrongestBbox 抑制重叠检测 (NMS),并使用识别网络识别交通标志。该函数将 yolo_tsr.mat 中的网络对象加载到持久变量 detectionnet 中,并将 RecognitionNet.mat 中的网络对象加载到持久变量 recognitionnet 中。该函数在后续调用中将重用这些持久性对象。

type('tsdr_predict.m')
function [selectedBbox,idx] = tsdr_predict(img)
%#codegen

% This function detects the traffic signs in the image using Detection Network
% (modified version of Yolo) and recognizes(classifies) using Recognition Network
%
% Inputs :
%
% im            : Input test image
%
% Outputs :
%
% selectedBbox  : Detected bounding boxes 
% idx           : Corresponding classes

% Copyright 2017-2019 The MathWorks, Inc.

coder.gpu.kernelfun;

% resize the image
img_rz = imresize(img,[448,448]);

% Converting into BGR format
img_rz = img_rz(:,:,3:-1:1);
img_rz = im2single(img_rz);

%% TSD
persistent detectionnet;
if isempty(detectionnet)   
    detectionnet = coder.loadDeepLearningNetwork('yolo_tsr.mat','Detection');
end

predictions = detectionnet.activations(img_rz,56,'OutputAs','channels');


%% Convert predictions to bounding box attributes
classes = 1;
num = 2;
side = 7;
thresh = 0.2;
[h,w,~] = size(img);


boxes = single(zeros(0,4));    
probs = single(zeros(0,1));    
for i = 0:(side*side)-1
    for n = 0:num-1
        p_index = side*side*classes + i*num + n + 1;
        scale = predictions(p_index);       
        prob = zeros(1,classes+1);
        for j = 0:classes
            class_index = i*classes + 1;
            tempProb = scale*predictions(class_index+j);
            if tempProb > thresh
                
                row = floor(i / side);
                col = mod(i,side);
                
                box_index = side*side*(classes + num) + (i*num + n)*4 + 1;
                bxX = (predictions(box_index + 0) + col) / side;
                bxY = (predictions(box_index + 1) + row) / side;
                
                bxW = (predictions(box_index + 2)^2);
                bxH = (predictions(box_index + 3)^2);
                
                prob(j+1) = tempProb;
                probs = [probs;tempProb];
                                
                boxX = (bxX-bxW/2)*w+1;
                boxY = (bxY-bxH/2)*h+1;
                boxW = bxW*w;
                boxH = bxH*h;
                boxes = [boxes; boxX,boxY,boxW,boxH];
            end
        end
    end
end

%% Run Non-Maximal Suppression on the detected bounding boxess
coder.varsize('selectedBbox',[98, 4],[1 0]);
[selectedBbox,~] = selectStrongestBbox(round(boxes),probs);

%% Recognition

persistent recognitionnet;
if isempty(recognitionnet) 
    recognitionnet = coder.loadDeepLearningNetwork('RecognitionNet.mat','Recognition');
end

idx = zeros(size(selectedBbox,1),1);
inpImg = coder.nullcopy(zeros(48,48,3,size(selectedBbox,1)));
for i = 1:size(selectedBbox,1)
    
    ymin = selectedBbox(i,2);
    ymax = ymin+selectedBbox(i,4);
    xmin = selectedBbox(i,1);
    xmax = xmin+selectedBbox(i,3);

    
    % Resize Image
    inpImg(:,:,:,i) = imresize(img(ymin:ymax,xmin:xmax,:),[48,48]);
end

for i = 1:size(selectedBbox,1)
    output = recognitionnet.predict(inpImg(:,:,:,i));
    [~,idx(i)]=max(output);
end

tsdr_predict 函数生成 CUDA MEX

为 MEX 目标创建一个 GPU 配置对象,并将目标语言设置为 C++。使用 coder.DeepLearningConfig 函数创建一个 CuDNN 深度学习配置对象,并将其赋给 GPU 代码配置对象的 DeepLearningConfig 属性。要生成 CUDA MEX,请使用 codegen 命令并指定输入大小为 [480,704,3]。该值对应于 tsdr_predict 函数的输入图像大小。

cfg = coder.gpuConfig('mex');
cfg.TargetLang = 'C++';
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
codegen -config cfg tsdr_predict -args {ones(480,704,3,'uint8')} -report
Code generation successful: To view the report, open('codegen/mex/tsdr_predict/html/report.mldatx').

要使用 TensorRT 生成代码,请将 coder.DeepLearningConfig('tensorrt') 作为选项传递给编码器配置对象来代替 'cudnn'

运行生成的 MEX

加载输入图像。

im = imread('stop.jpg');
imshow(im);

对输入图像调用 tsdr_predict_mex

im = imresize(im, [480,704]);
[bboxes,classes] = tsdr_predict_mex(im);

将类编号映射到类字典中的交通标志名称。

classNames = {'addedLane','slow','dip','speedLimit25','speedLimit35','speedLimit40','speedLimit45',...
    'speedLimit50','speedLimit55','speedLimit65','speedLimitUrdbl','doNotPass','intersection',...
    'keepRight','laneEnds','merge','noLeftTurn','noRightTurn','stop','pedestrianCrossing',...
    'stopAhead','rampSpeedAdvisory20','rampSpeedAdvisory45','truckSpeedLimit55',...
    'rampSpeedAdvisory50','turnLeft','rampSpeedAdvisoryUrdbl','turnRight','rightLaneMustTurn',...
    'yield','yieldAhead','school','schoolSpeedLimit25','zoneAhead45','signalAhead'};

classRec = classNames(classes);

显示检测到的交通标志。

outputImage = insertShape(im,'Rectangle',bboxes,'LineWidth',3);

for i = 1:size(bboxes,1)
    outputImage = insertText(outputImage,[bboxes(i,1)+bboxes(i,3) bboxes(i,2)-20],classRec{i},'FontSize',20,'TextColor','red');
end

imshow(outputImage);

对视频进行交通标志检测和识别

所包含的辅助文件 tsdr_testVideo.m 从测试视频中抓取帧,执行交通标志检测和识别,并绘制测试视频的每个帧的结果。

  % Input video
  v = VideoReader('stop.avi');
  fps = 0;
   while hasFrame(v)
      % Take a frame
      picture = readFrame(v);
      picture = imresize(picture,[920,1632]);
      % Call MEX function for Traffic Sign Detection and Recognition
      tic;
      [bboxes,clases] = tsdr_predict_mex(picture);
      newt = toc;
      % fps
      fps = .9*fps + .1*(1/newt);
      % display
       displayDetections(picture,bboxes,clases,fps);
    end

清除已加载到内存中的静态网络对象。

clear mex;

相关主题