Image Classification
From the series: Making Vehicles and Robots See: Getting Started with Perception for Students
Learn the basics of classifying images through deep learning. First, gain an understanding of what image classification and deep learning are, then discover how you can implement this workflow in MATLAB®. Start by creating a datastore to access and process your image data, then use the Deep Network Designer app to design and train a neural network, and finally test the performance of the network on a new data set using a confusion chart. At the end, complete an exercise to use this neural network in Simulink®.
Links Mentioned in this Video:
- How to Use Datastores - Video
- Training Options - Documentation
Published: 29 Nov 2023
When creating autonomous systems, they may need to know what type of environment they are in, or what objects they are looking at.
For example, let’s say you have a robotic arm that is meant to sort different kinds of trash and recycling. This robot has a camera that allows it to capture images of the items as they approach, but how can we teach it to distinguish between the different items?
In these situations, you’ll want to perform Image Classification. One of the most accurate ways to classify images is through deep learning.
So, what is Deep Learning?
It’s a process where you teach a machine to learn from experience, just like humans do. Using something called a “neural network” and LOTS of data, you teach the machine to start recognizing patterns and features until it can use those to make accurate predictions. Sometimes the machine can even exceed human-level performance.
In this video, I’ll show how to process image data and train a neural network that can accurately classify images based on whether they show a bottle, can, or detergent pouch. We will do this by creating a datastore to import our data, training a network in the deep network designer app, then testing the performance of the network in MATLAB.
Let’s start by looking at our data.
For today’s example, we have collected color images from an environment of 3D renderings of bottles, cans, and detergent pouches. For each of these objects, we have hundreds of photos, taken at different angles.
Let’s go over to MATLAB to start working with these images.
When preparing data for deep learning, the best way to bring it into MATLAB is by creating a datastore! Specifically, we’ll use an ‘imageDatastore’.
If you are new to datastores, check out this video, linked in the description, for an introduction.
To create an image datastore, we must first specify where the data is stored. In this case, the ‘TrashImages’ folder.
In the data folder, the images have been broken up into 3 subfolders, so we also need to set “IncludeSubFolders” to true.
We then set “LabelSource” to foldernames, so the datastore will use the name of the subfolder that each image comes from as a class label.
Now we have a datastore for all of the data, but you don’t typically want to use all of your data for training the neural network because you’ll need some data to test it’s performance.
Let’s split our data into three sets: a training set used for teaching the network, a validation set used for testing it’s performance as it is trained, and a test set used after training to see how well the network performs on new data.
We can use the ‘splitEachLabel’ function to accomplish this. Here, I use 70% of the data for training, 10% for validation, and 20% for testing, as this is a common starting division.
Currently, the images in each datastore are sorted in order of class, since that is how they are pulled from the folders. Let’s randomize the order of the images using the ‘shuffle’ function, so the network will learn the classes at a more even rate. Using the ‘preview’ function, we can confirm that the first image of the datastore is now different.
Voila! Now that your datastores are ready, it’s time to start creating your neural network.
When creating a deep neural network, there are two main options. You can create one from scratch, which tends to be time consuming and requires a very large amount of data. Or you can start with a pre-trained network and adjust it to solve your problem. This option, called transfer learning, requires much less data and takes less time than training from scratch.
Since our dataset is pretty small, we will be using transfer learning for today’s example.
To access pre-trained networks, you can open the Deep Network Designer App, which can be done from the command line or by clicking on it in the Apps Tab.
When the app opens, it will present you with a list of pre-trained networks. Some will show this triangle, which just means you need to install a free support package to use it.
There are many options for networks that accept images. I will be using Res-Net 18 for this example, but you can learn more about these options by clicking on this “Compare Pretrained Networks” link.
Open Res Net
Opening “Res-Net 18” will load the network into the app, where we can see it’s architecture. It starts with this image input layer, passes the data through all these layers in the middle that do most of the work, then these final few layers determine the output class. For transfer learning, we only really need to pay attention to these input and output layers.
First, lets check the input layer to see the input size. ResNet accepts images with 3 channels. Color images have 3 channels- a red, green, and blue channel– and the height and width will automatically be adjusted later, so we know that our images are compatible with this network.
Then we scroll down to these last few layers, where we will need to make a few modifications. This fully connected Layer and classification layer both have an output size of 1000, because that is how many classes ResNet-18 was originally trained for.
We only have three classes of objects, so we will remove these layers and replace them with new ones. For the new fully connected layer, specify an output size of 3, then the classification layer will automatically adapt to this value.
Now the network is ready to learn!
To start, move to the data tab, and import the training and validation datastores we created earlier. *brief pause to allow time for visuals*
Our training dataset has only 477 images, so it’s unlikely that these images show every angle or size of these objects. How can we make our dataset robust enough to train the network to classify the objects correctly in any situation?
One way to do this is through augmentation. In this window, we can choose to randomly apply effects or augmentations to the images every time the images are read in for training. Each time the network sees the image for training, it will look a little different.
You determine how much variation is created with these augmentations by setting minimum and maximum values for each effect. I picked these values because they are small changes that would be reasonable to see in the real world, but you can experiment with these.
Images in the validation set don’t need to be augmented since they are not used for training the network.
Here, we see that, the app will automatically resize the images to the size specified by the input layer of the network.
When you’re happy with your augmentation options, click “import”
This will bring you to a summary page that shows some high-level information about your dataset, the distribution of classes, and a preview of the augmented data.
If you’re happy with your data and augmentations, it’s time to move over to the training tab!
This is where we will train the neural network! You can view and change the way the network is trained by opening this ‘Training Options’ menu. We’ll use most of the default options here, but lower the Learning Rate to avoid overfitting, and lower ‘MaxEpochs’ so that training doesn’t take too long.
To learn more about what these options mean, check out this documentation page, which is also linked in the description. https://www.mathworks.com/help/deeplearning/ref/trainingoptions.html
Once you’ve set the options, click train, and watch as the results are visualized. This view shows how far into training you are, some details about the training options, and the accuracy and loss of the network when used on the training and validation data.
Ideally, accuracy will increase and loss will decrease. If the performance of your model stays low or decreases with more training, the training data, network architecture, or training options likely need to be adjusted. If these graphs start to plateau, you may not need to train your network as long. For our example here, the performance steadily improves over time, plateauing a little bit towards the end.
Once your network is trained, you can export it to the MATLAB workspace.
Now that we have a trained network, let’s see how well it performs on new data!
First, let’s resize the testing data and preview the first image.
To use the neural network on a new dataset, we can use the ‘classify’ function. You can use this on a single image, or an entire datastore.
These predictions can be used to evaluate your model by comparing the expected classnames to the predicted classnames, and seeing how often they match.
If you want to know how confident the network is in it’s predictions, we can use the ‘predict’ function. This returns an array of scores, where each row corresponds to an image in our dataset and each column corresponds to one class. A higher score indicates a higher confidence that this image contains that class, so for this first image, the network is fairly certain that it is a bottle! If you have scores that are closer together, like this one:
That indicates that the model is less confident, but is still pretty sure this is a bottle. This can help identify which types of images the model performs on well, and which ones it could learn better.
Another useful tool for evaluating a neural network is the ‘confusionchart’, which shows how often the network guessed each class correctly and incorrectly.
And that’s it! With just a few steps, we’ve trained, visualized, and tested an image classification model for our trash-sorting robot. These steps and basic concepts can be applied to any other image classification problem you may encounter.
For further practice, we’ve included an exercise that tasks you with using a neural network in a Simulink model, which is a common workflow when using neural networks with robots or other hardware. Please open up this template model and attempt to fill in the missing parts, then verify your results by simulating this model. A sample solution is also provided.
In summary, we learned that datastores help import and process large amounts of data.
Transfer Learning can help reduce the amount of data and computational resources needed to train a neural network.
The Deep Network Designer App provides an interactive option for quickly building, training, and fine-tuning several neural networks iteratively.
Even after a network has been trained with a high level of accuracy, it is important to test it on new, unseen data to see how well it generalizes, and where it could improve.
If you have any questions while you are working through this video or the included exercise, please feel free to reach out to us at roboticsarena@mathworks.com. Thanks for watching!