Faster RCNN code in Matlab
1 次查看(过去 30 天)
显示 更早的评论
I am trying to use trainFasterRCNNObjectDetection in Matlab 2017. As I understand it, in the original faster R-CNN paper the input size of the CNN first layer is the image size, for example 256*256. But in the Matlab example: https://se.mathworks.com/help/vision/examples/object-detection-using-faster-r-cnn-deep-learning.html they recommend using smallest object size in the image such as 32*32 see in below part. "Start with the imageInputLayer function, which defines the type and size of the input layer. For classification tasks, the input size is typically the size of the training images. For detection tasks, the CNN needs to analyze smaller sections of the image, so the input size must be similar in size to the smallest object in the data set. In this data set all the objects are larger than [16 16], so select an input size of [32 32]. This input size is a balance between processing time and the amount of spatial detail the CNN needs to resolve." I don't understand this part. For applying CNN, the input layer has the full image size. How can we find RPN using just smaller part of the image?
Can anyone help me?
0 个评论
采纳的回答
Eric Psota
2017-4-4
编辑:Eric Psota
2017-5-22
The Faster RCNN network is designed to operate on a bunch of small regions of the image. For example, if you're trying to detect people, and they never take up more than 200x200 regions in a 1080x1920 image, you should use a network that takes as input a 200x200 image. If you think about it, convolutional kernels don't care how big the input image is. You can pass an image of any size through the convolutional layers of a network, and the only thing that will change is the spatial dimensions at each layer. That is why Faster RCNN shares the convolutional layers between the region proposal network (RPN) and the classification/regression networks.
Once you get to the fully connected layers, this is a different story, since the connections are trained with a specific spatial dimension in mind. For this reason, Faster RCNN trains the RPN and classification/regression layers separately.
2 个评论
Eric Psota
2017-5-22
编辑:Eric Psota
2017-5-22
It doesn't check all possible 200x200 regions. Instead, it checks a subset of them which depends on a lot of factors. One of the factors is how much your convolutional layers reduce the spatial dimensionality of the original image. For example, consider a case where your original image was 1080x1920x3 and, after a series of convolutional - RELU - max pooling layers, your resulting feature map is 108x192x300. This is effectively a spatial downsampling of 1/10, so a 200x200 window in the full-scale example becomes 20x20 in the resulting feature map. At this point, Faster R-CNN might slide a 20x20x300 kernel through the feature map to determine if there are objects present in the spatial regions, effectively stepping by 10 pixels horizontally and vertically through the original image.
After the presence or absence of an object is established in a given location, a spatial chunk of the 108x192x300 feature map will be extracted (though ROI pooling) and passed through both the bounding box regressor and the classifier.
The other factors to consider are the height/width ratio of the regions, how many different sizes are considered, etc. But, hopefully this helps to explain how the regions get processed.
更多回答(2 个)
miao wang
2017-4-4
I am confused too. When i use my own dataset to train the Faster R-CNN and get detector,but when a test a picture,it's usually return empty bbox and scores [bboxes scores]=detect(detector,I);I do kown what's the problem, i also holp someone can help me.
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!