Computer Vision Made Easy
From the series: Computer Vision with MATLAB
Overview
Learn how MATLAB makes it easy to get started with computer vision.
Computer vision uses images and video to detect, classify, and track objects or events in order to understand a real-world scene. In this introductory webinar you will learn how to use MATLAB to develop computer vision algorithms to solve real-world imaging problems.
We will demonstrate through real-world examples how MATLAB makes it easy to:
- Detect objects in a cluttered scene
- Measure true size of objects in an image
This webinar assumes some experience with MATLAB and no experience with computer vision. We will focus on the Computer Vision Toolbox.
About the Presenter
Sandeep Hiremath holds a M.S. in Mechanical Engineering from Clemson University. Sandeep has spent 14 years supporting MATLAB and Simulink users at MathWorks. In his previous role at MathWorks, he spent 7 years as a technical evangelist supporting academic users of MATLAB.
Recorded: 18 Nov 2020
Welcome to the Computer Vision Made Easy webinar. My name is Sandeep, and I'm in the product marketing team at MathWorks. Computer vision is used extensively by our customers to solve a wide variety of vision problems in different application areas. For example, in automated driving to design lane departure warning system. And in robotics, to help plan the Mars Rover.
Here, is a simple example of a computer vision-based traffic monitoring system like what you might have noticed when driving past a traffic light. Here, a camera-based monitoring system is keeping track of the number of cars in the scene at any given point of time. In this system, computer vision was primarily used to detect, and count the cars in each frame of the light video from the camera. Detecting objects like cars in the video is a common computer vision task, but there are many more like them.
So what are some of these common tasks, or problems? Well, if you're new to computer vision you might be interested in knowing how do I detect objects in a scene. Or how do I measure objects, or regions in an image. Or how do I detect objects, or classify an event using deep learning. So in this video, our objective is to help you get started with addressing this common computer vision tasks, or problems. And you will do this through some real world examples using MATLAB.
So here are the examples that we will cover in the next 30 minutes, or so. So let's get started with our first example. In this example here, I have an image of a cluttered pile of playing cards on the left. And an image of a specific playing card on the right. My objective here is to detect, or locate the specific playing card in the image of a pile of cards. Now this might seem like an easy task. However, there are many challenges to solving this problem. The object of interest could appear to be smaller, or larger than the template image. Could be rotated, or skewed. Or they could be a partially hidden behind other objects. These are some common challenges that you will notice in most object detection problems.
Here, you cannot use a standard image processing algorithm like template matching to find this object. You will need a slightly more sophisticated, and robust approach. To solve this problem, I have chosen a very fundamental computer vision workflow called feature detection extraction, and matching. So before we get into understanding this workflow, let's first understand what is a feature. In simple terms, a feature is a distinct region in an image that leads to a unique match in another image. And something that is repeatable across many images. Some good examples of features in images are corners, edges, blobs, or maybe a template of an object, or region in the scene itself.
So what is this workflow look like? Feature-based object detection workflow involves three main steps. First, to detect interesting features in the template image of the object. And this is shown by the green markers. And you do the same for the other image as well. Second, to take a region surrounding the detected features, and encode some information about that region into what you call a feature descriptor. This is called feature extraction, and that is represented by the green circles around the markers. You do this for both the images.
And third, you look for corresponding matches between the extracted features of the two images. And then remove any outliers. Once you have a set of match features, you can then estimate the location of the object in the scene. Now, let's go to MATLAB, and see how to solve this problem.
OK, so this is my MATLAB environment. Now, the first thing that I'm going to do is to read the two images that we are working with. One, is the reference image of the card which is the king of hearts card that we are looking for. And second, is the cluttered card image that contains this card. So I'm going to use I am read from the image pricing toolbox to read the two images. And then use I am sure pair to display them next to each other.
So here are my images. As you can see, the king of hearts card and the images card is not exactly in the same orientation, and scale as the reference image. And probably also not in the same reference plane. So a simple image pricing technique like template matching will not work in this case. We need a more robust approach which is to look for features within these images, and compare them to find a match. So in the next section, I will use an algorithm called surf to detect features in these images.
Now, surf work specifically on 2D grayscale images. So I have to use RGB to create initially, to convert the color image to grayscale. And then, I've used detect surf features from the computer vision toolbox, and the reference image. This should output the detect feature points for the image. Next, I would like to visualize the detected feature points for this image. So here, I've chosen to plot only the strongest 50 points.
So here, is the reference image, and the 50 strongest feature points across the entire image whose locations are represented by crosses. And the circles around these location points represent the scale of the features which should both together are important parts of a feature descriptor. These feature descriptors represent the unique, and interesting information that describe, and differentiate them from other features in the image.
Now, surf algorithm defines these features by detecting blobs in this image. Blobs are nothing but connected areas in an image with high contrast pixels. In our case, a blob is the heart suit in the card. Or the king's eyes are few smaller ones in the character K area. Now, like surf there are many other detection algorithms available through the computer vision tool box. Like for example, we have fast which is really good at detecting corners in images. So refer to the help documentation to learn more about all these different feature detection algorithms.
Now, let's go back to our script, and see what's next. So next, I'm going to detect surf features on the card's image as well. Make sure to use the same detection algorithm on both images so that we can make an apples to apples comparison, and matching these features. So let's run this section. And look at this time, 300 of the strongest points on the image. So here, you can see we have feature points in the king of hearts card region, but also across the other cards in the image.
Now once we have detected the feature points for this image, next, we are going to extract the feature descriptors which are the region maps around these points. To do that, I'm using the extract features function from the computer vision toolbox. I pass the grayscale image and the detected points from before. And this will give us the feature vectors, and corresponding locations for each image. Now, the extraction method used here, depends on the detection algorithm used, which is surf in our case. I have the feature descriptors for both the images now.
Next, I need to compare them to find matches. So to do this, I'm going to use match features on the computer vision toolbox which returns the indices of the matching features in the two input feature sets. I can then obtain the corresponding matched feature points for both images, and use show match features to view the matched points. Let's run this section, and see the match points across the two images.
As you can see here, match features has found numerous matches in the two images. And also these matches are within the king of hearts region except for a few like these two points here in the Jack of spades card. Now, notice how there are some feature matches like in the card character region key, to the character in the cards image here. But also to the K in the reverse here. The reason for this, is that surf algorithm is rotation invariant, which means that it will detect matches, and features irrespective of their orientation. And also, irrespective of their scale. This is why features based detectors are much more robust than a simple template matching technique.
Now that we have found matches between the two cards, we are not done with the detection problem yet. I need to improve the matches by getting rid of the outliers, including the duplicate matches like to the character K in reverse so that I can find the exact region within the cards image where the king of hearts card is located. For this, I'm going to use the estimate geometric transform function from the computer vision toolbox to compute the transformation matrix that will determine how the reference image has to be geometrically, and spatially transformed so that it best fits within the bounding region of the king of hearts card in the card's image.
This function also returns the inlier points for both images which are obtained by eliminating all the outliers like the ones that we saw in the Jack of spades card, using an algorithm called RANSAC. RANSAC, or random sample consensus is a mathematical model that uses an iterative approach to estimate inliers by randomly sampling the observation points to find the optimal fitting result. In our case, this being a geometry transformation that will help fit the reference card image to the cluttered card's image. You can learn more about this algorithm by referring to help info for estimate geometric transform function.
Now, let's go ahead, and run the section to see the matched features again. But this time, with the outliers' remote. There you go. Those outliers, and the Jack of spades cards have been removed. And also, notice how some of the other ambiguous matches that we had seen earlier have also been eliminated. And then finally, using the transformation matrix that we obtained earlier, we can go ahead, and transform a rectangle of the dimensions of the reference card image, and call that new box polygon.
And then, use that to enter the card's image to indicate the exact location of the detected card in the cluttered card's image. So here is a final result. Now, what we have seen so far is how to detect a card in an image using a reference image. Now, what if we wanted to detect the type, or recognize this card? That is, it's a King of hearts card. Now, we could have again used feature matching approach to do this. But let's see a more sophisticated approach we will use a trained detection algorithm to perform detection.
With trained detectors instead of using a template to find, and match features, we will use surf-like technique to collect the feature sets from the numerous images of say, the character K. And then, using machine learning model like SVM, we will design a detector that can detect that character in other images. Since a trained detector has been trained with features from hundreds to thousands of template images, it is a much more robust approach like when detecting objects in varying lighting conditions, or trying to find an object by its class. That is, say, like detecting a car in an image irrespective of its model, or make.
So in our case, let's go ahead and use a train detector to detect the card character by its font inside the detected card. Which is, say, the letter K in the king of hearts card. And just so you know, I will do this using the optical character recognition, or OCR algorithm. So let's go ahead, and see how to use this algorithm in MATLAB. First, we will transform the card's image using fitgeotrans to match the reference image. And we do this so that the image can be spatially adjusted to match the dimensions of the reference image. By doing this, we can then easily extract the character, and suite information in the cards since we know this is always available in the top left region of any card.
So let's go ahead, and de-warp the cards image using the imwarp function, and see the output. So here is the king of hearts card region, and the card's image that has been transformed. Next, I need to extract the regions in the card containing the character, and suit, and use the extracted regions to detect each of them individually. To extract these regions, I have a helper function, getCardROIs that performs this extraction for me, and returns the regions containing the character, and suit. I'm using the montage command to display both the extracted regions. So let's run this section, and see the output.
Here are the extracted regions as two separate sub images. Next, I need to identify the character in the first sub image. To do this, I can use the optical character recognition, or OCR algorithm. OCR is a properly used technique to detect, and recognize text within an image. This could be printed, or handwritten texts. In MATLAB, we have a pre-trained OCR function that is available in the computer vision toolbox. There are multiple fonts that it can detect by default, but it can also be trained to detect custom fonts. So in my script, I am using the OCR function to detect the character in the sub image.
And here, is the output returned by OCR. And as you can see, it has successfully detected the card character as K. Next, I need to find the suit type in the second sub image. Well for this, I've taken a simpler approach for using template matching. I simply compare this sub image with a set of template images that I have of the four different suits. I'm using the vision.TemplateMatcher system object from the computer vision toolbox to perform template matching. If you wish to learn more about the system object, please go, and find vision.TemplateMatcher in the help documentation.
I run the matching across all the template images within the for loop. Here, within the for loop, I read one template image at a time, resize the sub image to match the template image, and then perform template matching. The system object in this case, returns a match metric where the largest metric value corresponds to the best match. So let's run the section, and see what the result for our sub image looks like. As you can see, we have successfully detected the output for template matching these hearts.
One thing to note here, is that template matching is a very rudimentary detection technique which worked well in this particular case. But if the suits were not aligned, scaled equally, or varying in contrast, this would have failed miserably. So for a more complex detection problem, consider using a more robust approach like feature-based detection that we saw earlier. And then finally, we can annotate the card image with the detected card character, and suit names that is king of hearts.
Like OCR, there are many other object detectors available in MATLAB through the computer vision toolbox, which can be used to detect some common objects like bar codes, people, faces, foreground objects in a video, and blogs. Some of these detectors can also be trained with your own data to detect other objects of interest. For example, to detect pedestrians in a road scene.
MATLAB, through the computer vision toolbox, also provides you with a set of ready to use object detectors based on a few popular deep learning networks. In MATLAB you can also input, and use pre-trained deep learning networks to solve object detection problems. For example, in this video, what we are showing, is a pre-trained network called AlexNet that has been trained to detect about 1,000 different object classes. MATLAB lets you import this popularly used network with a single line of code, and use it within your computer vision application.
There are many deep learning models like AlexNet that are used popularly in the research, and commercial field that you could quickly import into MATLAB, and start using them in your solutions. Now, note that you can retrain these networks in MATLAB with new data to detect other objects of interest. Here, is a list of some popularly used deep learning detectors. And also, note that these are available in MATLAB as ready-to-use functions. One other thing I want to emphasize, is that MATLAB, in addition to making it easy to apply deep learning to computer vision problems using pre-trained networks, it also provides interactive apps like labeling apps, and deep Network Designer app that make the whole workflow of free training these networks are designing new networks from scratch a very convenient, and exploitative process.
Now, let's move to the second example, which is to measure size of objects in an image. In the example, what I have is an image which is a mix of different kinds, and few other objects. What I would like to do is separate out the kinds in the image, and then determine the total value of all the kinds. For example, if I have two quarters, and one nickel, the total value is $0.55. Now, one way to do this, is to determine the different types of kinds in the image based on their size. And then, if I know the true diameter of each coin type, say, a quarter is 24.26 millimeters, I can use both of these pieces of information, and find out how many kinds of each type are in the image.
After that, it is a simple math to calculate the total value of coins in the image. Now, to compare the pixel-based dimension in the image, and the actual size, or dimension of an object, we need to compute the size of a pixel in real world units. This factor will help us find the true size of any object, or region within the image. Now, there are some challenges to accurately measuring the pixel size in real world units. One major challenge is that distortions due to camera properties could affect making such measurements effectively.
To solve such distortion-related challenges, I'm going to use a camera calibration workflow. Calibration allows us to estimate the parameters of properties of the lens, and camera. So what are these camera properties? Well, there are intrinsic properties like focal length, optical center, lens distortion, coefficient. And there are extrinsic parameters like position, and orientation of the camera with respect to the object. Using the estimated parameters, we can then correct the image taken from that lens for any distortions that might exist.
Camera calibration is a very commonly used technique, and many computer vision applications. For example, as a pre-processing step to correct images to remove lens distortion issues, or when trying to build a panoramic view by stitching together multiple images taken by the same lens. Or, when estimating the depth, or proximity of an object from the lens especially when using studio vision-based cameras. In our case, we will be primarily focusing on using calibration to remove distortions in the image to help accurately measure the pixel size, and real world units.
Now, let's go to MATLAB, and see how to solve this problem. So here is the image that we'll be working with. As you can see, this image actually has some kinds, and a few other objects in the bottom part of the image. And the top part of the image, we have a checkerboard pattern. So the checkerboard pattern is really essential for two things. One, is to be able to perform camera calibration. And the second, is really to transform this image for orientation, and any other distortions, and skew that might exist in the image.
So let's go ahead with the first step, which is camera calibration. So to perform camera calibration, I'm going to use the camera calibrator app that is available through the computer vision toolbox. To get that, I'm going to the Apps tab. And here within the image processing, and computer vision section, I can access the camera calibrator app. Let's bring that up.
So the first step within the camera calibrator app, is to bring the images that we are going to be using for the calibration step itself. So to do that, I click on Add images. And here, I can select the images that I'm going to be using for my calibration step. In this example, I'm using seven images. But typically, 10 to 20 images are recommended in drill situations. And also here, it asks you for the size of the checkerboard square which it's going to use for the calibration process. I know each square in my checkerboard pattern is about 20 millimeters.
Provide that, and then the app goes ahead, and starts looking at the images, and then starts detecting the checkbooks. And then, it provides the detection results. So here, you can see that it has detected checkerboard across all the six, and the seven images. And it says it's going to reject one of the images. So we'll say OK. And then now, it has shown me the six images that it's going to use for the calibration process.
So here are the results. You can see that it has detected the points within the checkerboard pattern in each of these images. Now, the next thing that I want to do is to be able to look at some of the options available in the camera calibrator app. The first thing here, is the camera model. I can either pick a standard lens, or a fisheye, or a wide angle lens. In my case, it's a standard lens. So I'm just going to keep this as my default selection.
Also, I have options to help me improve my parameter estimation. For that, I have radial distortion coefficients that I can actually choose. Radial distortion is typically what you see along the edges of the lens, which is along the optical center. And you would want to correct these distortions specifically for wide angle lenses. You can also compute, skew, and tangential distortions if you choose to. For now, I'm going to keep these options at default. And next, I'm going to hit the calibrate button. This should now go ahead, and start the calibration process, and give me some results.
The first thing that I want to show you here, is the projection errors. As you can see, the reproduction errors are typically calibration errors. We want to make sure that we keep the error values low. So the way to do that is to remove any outliers. And here in this case, the second image of the checkerboard is definitely a little above the overall mean errors. I'm going to choose to remove this, and then re-calibrate with the remaining five images. So now, you can see that I have removed that image, and I have a much better overall mean error, which is about 0.57 pixels. Before, it was about 0.63 pixels. It's not very different, but this is the process of trying to improve your calibration results.
After this, I can go ahead, and export camera calibration parameters to the MATLAB workspace. And the variable that I'm going to save this to is called camera parameters. Now, just so you know, if you were to do this camera calibration process manually, it could get really complicated, and ugly. Camera calibrator app really helps in making this whole workflow very convenient, and easy without being an expert in the calibration process. It lets you automatically detect the checkerboard points, it lets you work with a different variety of camera models, standard lenses, or right angle lenses. And it also automatically calibrate the images for you, and then makes those results available within the MATLAB environment to go ahead, and use it for further analysis.
OK. So now that we have completed the camera calibration process, and I have the camera parameters saved as a mat file, We can go ahead, and see how to use the camera parameters to undistort the coin's image. And then, work with this undistorted image for true measurement of coin size. My end goal is to identify the coins based on their sizes, and find the total value of all coins in the image. Now, let's look at this image.
So here is the image. As you can see, it contains the coins, and a few other objects in the bottom half of the image. And the checkerboard pattern on the top half. Now, the checkerboard pattern is very significant in solving this problem. And we will go over that more in detail later on. This image needs to be corrected, or undistorted using the camera parameters that we obtained from the camera calibration process. This way, the image is free of any distortions before proceeding further. To do that, I'm going to load the camera pair images first, and then use the undistort image function from the computer vision toolbox to correct the image. So let's run the section, and see the output.
In our case, the undistorted image doesn't look very different. But if we had an image that was taken using a wide angle lens, then you'd notice a significant difference after undistortion. Next, before I start detecting the coins, and measuring them I must make sure that all the pixels in the image have the same pixel to real world unit factor, which is a very important step towards making accurate measurements across the image. For this, we have to transform the original image using the reference checkerboard pattern since we know it's real world dimensions. This transformation will ensure that the pixel to real world measurements are uniform across the entire image.
Now, let's see how this transformation works. I'm going to first determine the checkerboard pattern points using the detect checkerboard points function that is available in the computer vision toolbox. Let's display these points in the image. Next, using the information I know about the checkerboard pattern, like how many squares along the rows, and along the columns, I can determine the corner points of the detective checkerboard region. Then, we'll find the approximate location of the corner in a new plane that would remove any perspective projection that might exist due to the angle of the camera to the plane containing the checkerboard pattern, and coins.
These new kind of points are available to us in a variable called basePts. So let's run the section, and see the base points with respect to the corner points. Now, using the base points as a reference, we will find the transformation matrix that we need to project the image on the plane represented by the base points. And we do this using the fitgeotrans function from the image pressing toolbox. This returns a transformation matrix that I can use to de-warp the kinds image using the imwarp function. Let's run the section, and see the output.
So here is a transformed image. So this image is now ready for making accurate measurements. Now that I have a transformed image containing the coins, the next thing I need to do is to detect the coins in the image, which are nothing but circular objects. So in this section here, I am first converting the transformed image to binary using the imbinarize function. I have a slider here to adjust the threshold value. This way, I can get the desired result from a binary output. And then, I'm using imfindcircles to find circles that comes from the image processing toolbox to find the coins in the image.
As you can see, I have a few parameters here. And I have a slider here again to adjust the sensitivity parameter so that it picks up only the strongest circles that represent the coins in the image. You can learn more about these other parameters of the imfindcircles function by referring to the help info on it. Once I have detected the circles, and I know the pixel coordinates of the centers, and the radii, I'm going to use viscircles to display these circular regions on the coins image. So let's run this section, and see the results.
As you can see, the coins have been detected as circles. It missed out on one coin. So I can go back to I am find circles, and increase the sensitivity by adjusting the slider here to make sure that I have all the coins detected. So one thing to note here, is that imfindcircles circles is not a very robust technique for such detection problems. We are assuming here that coins are the only circular objects in the image. Also, with waiting light conditions, the detection of circles may not always work well. In that case, we will have to use a more robust approach like using a train detector like we discussed in the card detection example to detect the coins in the image.
So now that we have detected the coin regions in the image, and I know the size of the coins, and pixel values, next, I will have to compute the size of the coins in real world units. To do this, I need to first know what is the pixel to real world units factor which I can then use to compute the real world size that is in millimeters, the radius of the coins. So to do this, I'm going to use again, the checkerboard pattern as my reference. I will detect the checkerboard points like before, and then compute the size of the square in the pattern in pixels. I know that the size of the square of the checkerboard pattern is 20 millimeters. So using that, I can compute the pixel to millimeter factor for this image. So let's run the section, and see the output.
So here is the pixel to millimeter factor using the checkerboard. So now using this pixel to millimeter factor with the coins, sizes and pixel, I can compute the coin sizes in millimeters. Like the known size of the square of the checkerboard, we also know the true radii of the coin types in our image. And in our case, these are cents, nickels, and quarters.
So in this next section here, I find the pixel size of the three coin types using the true size values, and the pixel to millimeter factor that we computed earlier. And with this as reference, I can compare them with my pixel values of circles that we obtained earlier from the image. And I can then bend each circle as either a cent, nickel, or quarter.
So here, I first sort the radii of the circles. And then use histogram, which is a MATLAB function, to bend the circles with the coin pixel sizes as the reference. Let's run the section to see the distribution of these circles as these three coin types. You can see in the graph that we have two circles which are binned as cents, three as nickels, and 4 as quarters. I use this histogram function again, to obtain the counts this time, and I display the count. Once I have the count of each coin type in the image, I can very easily calculate the total value of the coins in the image. And this is $1.17 which is the actual value of the coins in the image. And this way, we have verified that our coin measurements, and binning actually worked well for this image.
And finally, we can display our final results on the coin's image. Notice that I've used viscircles again here, to indicate the different coin types in the image with colored circles. Now, what if we had to measure size of an object like a leaf, which doesn't have a very defined shape, and there is no easy way to automate the detection of such an object in the image? In this case, once we know the pixel to MM factor for that image that contains the leaf, we can use imdistline, which lets you to interactively measure the distance between two pixels in the leaf region. And then, you can use that to compute the distance in real world units using the pixel to MM factor.
So here, I have replaced the pixel rating with the computer distance in millimeters. So in a sense, we have seen that if we have a reference like the checkerboard pattern in any image, we can compute the pixel to real world units factor, and use that to measure the true size of any object or region in that image. So what we have seen in MATLAB, was the camera calibrator app that supports standard, and wide angle lenses. Computer vision toolbox provides a separate serial camera calibrator app to help calibrate images taken from a studio camera pair, typically useful in tasks like depth estimation.
So that brings us to this final summary slide. We have seen how getting started can be made easy using MATLAB for someone new to computer vision. We have seen through some examples-- MATLAB makes it easy specifically, to detect objects with using readily available feature, and deep learning-based detectors, or even customize them with your own data to make real world measurements, and images, using the interactive calibration apps, and object analysis workflow. And to ramp up quickly on deep learning with the help of pre-trained networks, and detailed examples in documentation.
So in conclusion, if you're thinking where to go from here, here are some next steps. Go to the computer vision toolbox product page to learn more about computer vision, its applications, and other features, and capabilities that come with the toolbox. If you want a more in-depth experience on learning how to use computer vision toolbox, sign up for an instructor led training course. This is available also as an online course. As mentioned before, there is also one on deep learning. If you're ready to explore the product, and start solving your problems, get a trial license for the product today.
As we have seen, there are some deep learning-based object detectors that are available in the computer vision toolbox. But there is also a lot more that you can do with deep learning. Go to the deep learning toolbox page to learn more about this product, and its extensive capabilities. Some of you might be also interested in running your computer vision algorithms eventually on a hardware platform like an ARM-based Raspberry Pi, or a GPU-based NVIDIA Jetson board. Go to the embedded vision with MATLAB solution page to learn more about this, and other hardware implementation-related capabilities from MATLAB. Thank you very much for your attention.