Hi Christos.
To address the class imbalance problem in your object detection dataset, here are a few approaches you can consider:
1) Hard Sampling Approach: As you mentioned, you can remove images with only 'cats' and delete 'cats' bounding boxes from images. To achieve this in MATLAB, you can use the following steps:
- Load your groundTruth object
- Iterate through the groundTruth object and identify images with only 'cats' or only 'dogs'.
- Remove those images from the groundTruth object using the removeImages function.
- Iterate through the bounding box data and remove 'cats' bounding boxes from images. You can use the following function: __ = bboxerase(__,EraseThreshold=threshold), which specifies the threshold for the amount of overlap between a bounding box region and the specified region-of-interest. A bounding box is removed if the overlap between the bounding box region and the region-of-interest is equal to or greater than the specified threshold.
- Save the modified groundTruth object for further processing.
2) Random Selection/Deletion: Instead of manually selecting images to remove, you can randomly select a subset of images for each class to achieve a more balanced dataset. Here's how you can do it:
- Load your groundTruth object.
- Randomly select a subset of images with 'cats' and 'dogs' using the randperm function.
- Remove the remaining images from the groundTruth object using the removeImages function.
- Save the modified groundTruth object.
3) Augmentation: Applying augmentation techniques specifically to the 'cats' class can help balance the dataset. MATLAB provides the imageDataAugmenter function to perform data augmentation. Here's an example of how you can apply augmentation to the 'cats' class:
- Load your groundTruth object.
- Separate the 'cats' bounding boxes from the 'dogs' bounding boxes using the bboxerase function.
- Create an imageDataAugmenter object and specify the desired augmentation techniques (e.g., rotation, scaling, flipping) using the augment function.
- Apply augmentation only to the 'cats' bounding boxes using the augmentData function.
- Merge the augmented 'cats' bounding boxes with the original 'dogs' bounding boxes.
- Save the modified groundTruth object.
Hope this helps.