Computer vision uses images and video to “understand” a real-world scene. It goes beyond image processing, which can remove noise or blur and identify objects based on size, color, or location.
Instead, computer vision can also interpret the scene, identifying objects like pedestrians and traffic signs. More specifically, computer vision techniques can identify, track, measure, detect and classify objects in images and video.
Let’s see a few of these in action.
We’ll first look at how computer vision identifies objects based on motion. This includes very basic techniques like image subtraction, or more advanced algorithms like optical flow -- taking into account groups of pixels rather than individual pixels.
Next, you can see how computer vision is used for object tracking. Examples include point trackers for robust face tracking, or multi-object tracking for objects that are obstructed.
Camera calibration allows you to remove lens distortion, measure an object’s real-world size, or estimate distance from the camera.
Next, feature detection, extraction, and matching allows you to find an object in a cluttered scene. I’m able to detect and follow the object around - even if it changes size or orientation.
Here, you can see machine learning and deep learning techniques that allow you to identify or classify objects in a scene. For example, you can train a model to identify images of French fries or sushi.
You can also do all of these computer vision techniques in 3D using point cloud processing and stereo vision, which is a growing research area for autonomous vehicles and robotics. The idea involves capturing individual points in space and mapping them to a real-world scene.
Hopefully you now have a sense of what you can do with computer vision and what the term means. So, how do you start? Everything I showed today happens to be MATLAB examples that you can try yourself.
You can get a free trial and watch videos to help you get started. For more information, check out the links below.