Getting Started with OCR
Optical character recognition or Optical character recognition (OCR) refers to the ability
to detect text in an image. OCR is useful in many computer vision applications such as image
search, document analysis, and robot navigation. The images can be of any type of document,
or a scene that contains text (for example, license plates). Computer Vision Toolbox™ provides functionalities to detect and recognize text in multiple languages
and train OCR models to recognize custom text. The ocr
function is at the core of these functionalities, which use it to
perform text detection and recognition in an image.
Text Detection
The first step for OCR is to detect regions of text in an image. Computer Vision Toolbox includes these approaches.
Built-in Layout Analysis
The ocr
function uses the Tesseract OCR engine to
perform automatic text detection and recognition, which performs best when the text
is located on a uniform background and formatted like a scanned document. When the
text has a different layout, use the LayoutAnalysis
name-value
argument of the ocr
function to specify the layout
of the text in the image. You can set the layout to "auto"
,
"page"
, "block"
,
"line"
, "word"
, or
"character"
. When the layout is "line", "word", or
"character" layout it's best to also specify the locations of text region. For more
details, see Specify Text Regions.
The built-in layout analysis in the ocr
function analyzes a binarized
input image by assuming the image has dark text against a uniform, light background.
If the image contains a nonuniform background or lighting, use binarization to
prepare the input image for text recognition. For more information, see the Troubleshoot OCR Function Results section.
CRAFT Text Detector
The detectTextCRAFT
function offers robust text detection based on the
Character-Region Awareness For Text detection (CRAFT) model, which can detect text
regions in images regardless of factors such as image background, contrast, and
intensity values. When you have difficulty segmenting the text regions in an image,
use the pretrained, deep-learning-based CRAFT model. This model requires more
computational resources than other text detection approaches, and also requires a
Deep Learning Toolbox™ license. For more information, see Automatically Detect and Recognize Text Using Pretrained CRAFT Network and OCR.
Custom Text Detection
Computer Vision Toolbox provides several tools for users to develop custom algorithms to detect text in complex image scenes. These examples provide different approaches to image preprocessing algorithms:
Recognize Text Using Optical Character Recognition (OCR) — Overview of preprocessing, with an example using blob analysis.
Segment and Read Text in Image — ROI-based preprocessing.
Opening by Reconstruction — Remove artifacts to produce a cleaner image.
Correct Nonuniform Illumination and Analyze Foreground Objects — Enhance an image.
Text Recognition
The ocr
function supports text recognition
functionality in 64 languages. The function recognizes text in English by default, and
works well on scanned documents when using all of the default values for the function.
For example, this image shows a scan of a business card.
Specify Text Regions
Using text regions for OCR can improve performance, especially when performing OCR
on images of a natural scene that contains words. To recognize text in these kinds
of images, specify ROI bounding boxes around text regions and then use the ocr
function with an ROI input. For
example, this code snippet uses an ROI to specify the location of the tex ton an
accessible parking
sign:
I = imread("handicapSign.jpg");
roi = [360 118 384 560];
ocrResults = ocr(I,roi);
Specify OCR Model
The ocr
function supports text
recognition in 64 languages through built-in language model files. To use these
models, specify the Model
name-value argument of the ocr
function. For faster
performance (but with less accuracy) using the built-in models, you can append
"-fast"
to the language model string. For example,
"english-fast"
, "japanese-fast"
, and
"seven-segment-fast"
. This code snippet recognizes the
seven-segment characters in an image.
I = imread("sevSegDisp.jpg"); roi = [506 725 1418 626]; ocrResults = ocr(I,roi,Language="seven-segment");
Note
Computer Vision Toolbox ships with language model files for recognizing English,
Japanese, and seven-segment characters. To perform recognition on other
language characters using the ocr
function you must
install the OCR Language Data Files support package. For more details, see
Install OCR Language Data Files.
Troubleshoot OCR Function Results
If your OCR results are not what you expect, you can try one or more of these options:
The built-in layout analysis in the
ocr
function analyzes binarized input image by assuming a uniform background and dark text on a light background. If the image contains nonuniform background or lighting, use binarization to prepare the input image for text recognition. Use thegraythresh
andimbinarize
functions to binarize the image. If the characters are not visible in the results of the binarization, then the image has a potential nonuniform lighting issue. Try applying top-hat filtering by using theimtophat
function, or use other techniques that address non-uniform illumination.Increase the image size 2–4 times larger.
If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.
Use binarization to check for nonuniform lighting issues. Use the
graythresh
andimbinarize
functions to binarize the image. If the characters are not visible in the results of the binarization, then the image has a potential nonuniform lighting issue. Try applying top-hat filtering by using theimtophat
function, or other techniques that address non-uniform illumination.Use the region-of-interest option to isolate the text. Specify the ROI manually or use text detection.
If your image looks like a natural scene that contain words, such as a street scene, rather than a scanned document, try setting the
LayoutAnalysis
name-value argument to either"Block"
or"Word"
.Ensure that the image contains dark text on a light background. If the image instead contains light text on a dark background, you can binarize the image and invert it before passing the image to the
ocr
function.
Train Custom OCR Models
In some cases, to get accurate recognition results, you must train a custom OCR model. For example, when the text in your images use a proprietary font that is significantly different from any of the available fonts, or when OCR results are not what you expect even after trying the troubleshooting steps. For more details on how to train a custom OCR model, see Train Custom OCR Model.
Create Ground Truth Data
You can use the Image Labeler app to interactively label images for ground truth data for training and evaluating OCR models. For more details, see Prepare Training Data.
Evaluate and Quantize OCR Results
Optionally, you can quantize a trained model for faster performance, by using the
quantizeOCR
function, but this can decrease the accuracy of the model.
Use the metrics generated by the evaluateOCR
function to evaluate the quality of the OCR results.
See Also
Apps
Functions
ocr
|trainOCR
|evaluateOCR
|quantizeOCR
|ocrTrainingData