Main Content

Get Started with Voxel R-CNN

The voxel region-based convolutional neural network (Voxel R-CNN) is a voxel-based two-stage framework to detect objects in 3-D space. After dividing input point clouds into regular voxels, the framework extracts features and generates 3-D region proposals for objects. The framework then uses voxel region of interest (ROI) pooling and a detect head to generate precise 3-D bounding boxes for different object classes. To preserve crucial spatial information, the Voxel R-CNN maintains the 3-D structural context throughout the detection process. This capability enables it to localize and detect objects more accurately. The Voxel R-CNN provides a promising solution for real-world applications, such as autonomous driving and robotics navigation.

Voxel R-CNN Network

The Voxel R-CNN consists of these core modules:

  • 3-D Backbone Network — The Voxel R-CNN first divides input point cloud data into regular voxels, and then processes the voxels through a 3-D backbone network. The network captures the essential spatial context for 3-D object detection by abstracting the voxels into 3-D feature volumes.

  • 2-D Backbone Network — To simplify the complexity of 3-D data, the Voxel R-CNN framework converts the 3-D feature volumes into bird's-eye-view (BEV) representations. The 2-D backbone network and region proposal network (RPN) components work together to process BEV representations and generate dense 3-D region proposals for potential object locations.

  • Voxel ROI Pooling and Detect Head — The voxel ROI pooling directly extracts region of interest features from 3-D feature volumes for the proposed regions. This process retains more spatial context, which is crucial to accurately detect and localize objects in 3-D space. The detect head then uses these ROI features for further refinement and box regression.

Create Voxel R-CNN Network

To create a Voxel R-CNN network, use the voxelRCNNObjectDetector object. Using this object, you can create an untrained or pretrained 3-D object detector trained on the KITTI or PandaSet data set. For more information on this object, see voxelRCNNObjectDetector.

Transfer Learning

Transfer learning is a deep learning technique in which you use a pretrained network as a starting point to learn a new task through further training. This process involves fine-tuning the network with new data and applying its previously learned knowledge to the new task. Compared to training an untrained network, this approach reduces the need for extensive data and training time for the new task.

To perform transfer learning with a pretrained Voxel R-CNN network, specify new object classes and their corresponding anchor boxes. Then, train the network on a new data set.

Train Network and Detect Objects

Use the trainVoxelRCNNObjectDetector function to train a Voxel R-CNN network. To perform object detection on a pretrained Voxel R-CNN network, use the detect object function.

To evaluate the detection results, use the evaluateObjectDetection function.

References

[1] Deng, Jiajun, Shaoshuai Shi, Peiwei Li, Wengang Zhou, Yanyong Zhang, and Houqiang Li. “Voxel R-CNN: Towards High Performance Voxel-Based 3D Object Detection.” Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 2 (May 18, 2021): 1201–9. https://doi.org/10.1609/aaai.v35i2.16207

See Also

| |

Related Topics