What is the convention for SSD Object Detection Training Data
4 次查看(过去 30 天)
显示 更早的评论
So i am trying to train an SSD object detector from a custom dataset. I did not label the data set, but downloaded it with the bounding boxes given in an arbitrary format. I first converted the images into imagedatastore and labeldatastore under the convention that the bounding boxes are described as [centre_x, centre_y, width, height] as described here https://ch.mathworks.com/help/vision/ref/trainssdobjectdetector.html. After 50 generations with 10'000 images, the network did not recognize any person correctly nor the amount of people. Experimented with threshold.
Then i saw in the vehicle example ( https://ch.mathworks.com/help/vision/ug/object-detection-using-single-shot-detector.html ) that the training data followed another convention. The convention matlab uses for rectangle description with [top, left, width, height]. Same amount of data, same generations same result.
The labelling app also followed the same convention so i tried again, but this time i first arranged my ground truth into the groundTruth type. I then followed the procedure from the example (https://ch.mathworks.com/help/vision/ug/object-detection-using-single-shot-detector.html) to extract the datastores again, combine them as stated and this time i got a different network as the one before, altough taking the exact same convention for describing the bounding boxes and the same data.
At last is saw a video about training ObjectDetectors and there the convention is not explicitly described as, but depicted as [left, bottom, width, height] https://www.youtube.com/watch?v=UnXDQmjYvDk&t=770s. The video also states that everyfunction for training should accept the groundTruth datatype, but SSDtraining does not, so i relized it is indeed an exception.
I also realized that the groundTruth data contains information about the type of bounding box (rectangle, polygon, ...) , which is lost, when extracting the datastores from the groundTruth data.
Can someone state what the correct convention is to describe the bounding boxes and maybe add a hint to what else i could be doing wrong? Ty
0 个评论
回答(1 个)
T.Nikhil kumar
2023-9-28
Hello Florin,
I understand that you want to know about the correct notation for representing bounding boxes and finally to train an SSD object detector on a custom dataset.
The standard notation to represent a bounding box is [x, y, width, height] where [x,y] denote the top-left coordinates of the bounding box and “width” and “height” denote the width and height of the bounding box respectively.
You can follow the exact workflow as mentioned in the documentation to perform training on an SSD object detector if your bounding boxes are in this standard format.
After using the “Image Labeller” app to create ground truth in the form of a “groundTruth” type object, you will need to use the “objectDetectorTrainingData” function to extract data from the “groundTruth” type object in the form of “imageDatastore” and “boxLabelDatastore” or as the “trainingData” table itself. You can then use this as an argument for the “trainSSDObjectDetector” function.
You can select the type of bounding boxes you want (rectangle/cuboid/polygon) in your ground truth data using the “selectLabelsByType” function.
You can refer to the “trainingData” section of the following documentation to understand about the type of objects accepted for the “trainSSDObjectDetector” function. This also contains the workflow for training a SSD object detector on a custom dataset.
You can refer to the following documentation to understand how to create training data for an object detector form ground truth
You can refer to the following documentation to understand about the “selectLabelsByType” function.
I hope this helps!
0 个评论
另请参阅
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!