Hi Doron,
Some object detection frameworks might not handle empty labels or annotations smoothly. Here are some alternatives to synthetically pasting a person in the images and training the model which could be more effective:
- Modify the Labeling: You can modify the data preprocessing or labeling script to include a "background" or "negative" class. Instead of leaving the label blank, you can label the whole image as background, which would explicitly teach the model what is not a person.
- Augment the Data: Instead of manually pasting a person into the images, you could use data augmentation techniques to programmatically add people to those images.
- Custom Loss Modification: You can modify the loss function to ignore the contribution of images without any bounding boxes to the localization loss. This way, the model can still learn from the negative samples for the classification part without being penalized for the absence of bounding boxes.
You can refer to the following links for more information:
- https://www.mathworks.com/help/deeplearning/ug/image-augmentation-using-image-processing-toolbox.html (Workflow on image augmentation using image processing toolbox)
- https://www.mathworks.com/help/vision/ug/object-detection-using-yolo-v3-deep-learning.html (Example on how to train a YOLO model for object detection. This example also demonstrates several data augmentation techniques)
I hope this helps!