Main Content

voxelRCNNObjectDetector

Create Voxel R-CNN object detector

Since R2024b

    Description

    The voxelRCNNObjectDetector object creates a voxel region-based convolutional neural network (Voxel R-CNN) to detect 3-D objects in a lidar point cloud. Using this object, you can:

    • Create a pretrained 3-D object detector by using a Voxel R-CNN deep learning network trained on the KITTI or PandaSet data set.

    • Detect 3-D objects in a lidar point cloud by using the detect object function.

    • If you have training data, you can create an untrained voxelRCNNObjectDetector object and use the trainVoxelRCNNObjectDetector function to train the network. Using this function, you can also perform transfer learning to retrain a pretrained network.

    Creation

    Description

    detector = voxelRCNNObjectDetector creates a pretrained Voxel R-CNN object detector by using a Voxel R-CNN deep learning network trained on the PandaSet data set.

    To use this object, your system must have a CUDA® enabled NVIDIA® GPU. For information on the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).

    example

    detector = voxelRCNNObjectDetector(weights) creates a pretrained Voxel R-CNN object detector by using a Voxel R-CNN deep learning network trained on the specified data set weights weights. Specify weights as "kitti" to use a Voxel R-CNN network trained on the KITTI data set. Specify weights as "pandaset" to use a Voxel R-CNN network trained on the PandaSet point cloud data set.

    detector = voxelRCNNObjectDetector(weights,ModelName=modelName) additionally sets the ModelName property of the Voxel R-CNN object detector.

    detector = voxelRCNNObjectDetector(weights,classNames,anchorBoxes) creates a pretrained or untrained Voxel R-CNN network using a specified set of classes and anchor boxes.

    If you specify weights as "kitti" or "pandaset", the function creates a pretrained network and configures it to perform transfer learning using the specified set of classes. For optimal results, train the detector on new training data before performing detection. Use the trainVoxelRCNNObjectDetector function to train the detector.

    If you specify weights as "none", the function creates an untrained Voxel R-CNN object detector. Use the trainVoxelRCNNObjectDetector function to train the detector before performing object detection.

    detector = voxelRCNNObjectDetector(weights,classNames,anchorBoxes,Name=Value) sets the ModelName, PointCloudRange, and VoxelSize properties by using one or more name-value arguments. For example, voxelRCNNObjectDetector("kitti",classNames,anchorBoxes,ModelName="objectDetector") creates a voxelRCNNObjectDetector object with the model name "objectDetector".

    Note

    This functionality requires Deep Learning Toolbox™, Parallel Computing Toolbox™, Lidar Toolbox™, and the Lidar Toolbox Interface for OpenPCDet Library support package. You can download and install the Lidar Toolbox Interface for OpenPCDet Library from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

    Input Arguments

    expand all

    Name of the data set for the pretrained network, specified as one of these options.

    • "pandaset" — Creates a voxelRCNNObjectDetector object using a pretrained Voxel R-CNN deep learning network trained on sequences using Pandar64 sensor data from the PandaSet data set.

    • "kitti" — Creates a voxelRCNNObjectDetector object using a pretrained Voxel R-CNN network trained on sequences using Velodyne® HDL-64 sensor data from the KITTI data set.

    • "none" — Creates a voxelRCNNObjectDetector object using an untrained Voxel R-CNN network. For this option, you must specify the class names and anchor boxes using the classNames and anchorBoxes input arguments, respectively.

    Note

    The pretrained networks have been trained to use intensity and location information of point cloud data that follows a right-handed coordinate system, where the positive x-axis points forward from the ego vehicle and positive y-axis points to the left of the ego vehicle. If your data follows a different coordinate system, use the pctransform function to apply a geometric transformation to your point cloud data, and the bboxwarp function to apply the same geometric transformation to your bounding boxes.

    Data Types: char | string

    Properties

    expand all

    Name of the Voxel R-CNN network, specified as a string scalar or character vector.

    • If you specify weights as "pandaset", the default value of ModelName is 'pandaset'.

    • If you specify weights as "kitti", the default value of ModelName is 'kitti'.

    • If you specify weights as "none", the default value of ModelName is ''.

    Data Types: char | string

    This property is read-only.

    Names of object classes for training the network, specified as a vector of strings, cell array of character vectors, or categorical vector. To set this property, you must specify the classNames input argument at object creation.

    • If you create an untrained network by specifying weights as "none", you must specify the classNames input argument to set this property and configure the voxelRCNNObjectDetector object for training.

    • If you create a pretrained network by specifying weights as "kitti" or "pandaset", you can set this property by specifying the classNames input argument, configuring the voxelRCNNObjectDetector object for transfer learning. Otherwise, the network uses the default classes for the pretrained network.

    Data Types: cell | string | categorical

    This property is read-only.

    Set of anchor boxes, specified as an N-by-1 cell array. N is the number of object classes. Each cell defines an anchor box as a vector of the form [xlen ylen zlen zctr zrot], where:

    • xlen, ylen, and zlen specify the length of the anchor box along the x-, y-, and z-axes, respectively. Units are in meters.

    • zctr specifies the center of the anchor box along the z-axis. Units are in meters.

    • zrot represents the orientation of the anchor box along the z- axis, which is the yaw angle of the lidar sensor. The angle is clockwise-positive when looking in the forward direction of the z-axis. Units are in degrees.

    Each cell must contain at least two anchor boxes with different yaw angles. To set this property, specify the anchorBoxes input argument at object creation.

    Data Types: cell

    This property is read-only.

    Range of the input point cloud, specified as a six-element real-valued vector of the form [ xmin xmax ymin ymax zmin zmax ]. Units are in meters.

    • xmin and xmax are the minimum and the maximum limits along the x-axis, respectively. You must specify these limits such that the ratio of (xmaxxmin) and the voxel dimension along the x-axis is a multiple of 16.

    • ymin and ymax are the minimum and the maximum limits along the y-axis, respectively. You must specify these limits such that the ratio of (ymaxymin) and the voxel dimension along the y-axis is a multiple of 16.

    • zmin and zmax are the minimum and the maximum limits along the z-axis, respectively. You must specify these limits such that the ratio of (zmaxzmin) and the voxel dimension along the z-axis is 40.

    If you specify weights as "pandaset", the default value of PointCloudRange is [-70 70 -40 40 -3 1].

    If you specify weights as "kitti", the default value of PointCloudRange is [0 70.4 -40 40 -3 1].

    To set this property, specify it as a name-value argument at object creation. For example, voxelRCNNObjectDetector("kitti",classNames,anchorBoxes,PointCloudRange=[-80 80 -40 40 -2 2]) sets the range of the input point cloud to [-80 80 -40 40 -2 2].

    Data Types: single | double

    This property is read-only.

    Size of the voxels, specified as a three-element vector of the form [length width height]. Units are in meters.

    • length is the dimension of a voxel along the x-axis. You must specify voxel length such that the ratio of (xmaxxmin) and the voxel length is a multiple of 16. xmin and xmax are the minimum and the maximum limits of the point cloud range along the x-axis, respectively.

    • width is the dimension of a voxel along the y-axis. You must specify voxel width such that the ratio of (ymaxymin) and the voxel length is a multiple of 16. ymin and ymax are the minimum and the maximum limits of the point cloud range along the y-axis, respectively.

    • height is the dimension of a voxel along the z-axis. You must specify voxel height such that the ratio of (zmaxzmin) and the voxel height is 40. zmin and zmax are the minimum and the maximum limits of the point cloud range along the z-axis, respectively.

    To set this property, specify it as a name-value argument at object creation. For example, voxelRCNNObjectDetector("kitti",classNames,anchorBoxes,VoxelSize=[0.1 0.05 0.1]) sets the size of voxels to [0.1 0.05 0.1].

    Data Types: single | double

    Object Functions

    detectDetect objects using Voxel R-CNN object detector

    Examples

    collapse all

    Create a Voxel R-CNN object detector.

    detector = voxelRCNNObjectDetector;

    Read the input point cloud.

    filename = "PandasetLidarData.pcd";
    ptCloud = pcread(filename);

    Run the pretrained Voxel R-CNN object detector on the input point cloud.

    [bboxes,scores,labels] = detect(detector,ptCloud); 

    Display the detected bounding boxes. For better visualization, select a region of interest, roi, from the point cloud data.

    roi = [0.0 89.12 -49.68 49.68 -5.0 5.0];
    indices = findPointsInROI(ptCloud,roi);
    figure
    ax = pcshow(select(ptCloud,indices).Location);
    zoom(ax,1.5)
    showShape("cuboid",bboxes,Parent=ax,Color="green",Opacity=0.1,LineWidth=0.5);

    References

    [1] Deng, Jiajun, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. “Voxel R-CNN: Towards High Performance Voxel-Based 3D Object Detection.” Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1201–9. https://doi.org/10.1609/aaai.v35i2.16207

    [2] Geiger, Andreas, Philip Lenz, and Raquel Urtasun. “Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite.” In 2012 IEEE Conference on Computer Vision and Pattern Recognition, 3354–61, 2012. https://doi.org/10.1109/CVPR.2012.6248074.

    Version History

    Introduced in R2024b