ocr
Recognize text using optical character recognition
Description
returns an
txt
= ocr(I
)ocrText
object containing optical character
recognition information from the input image, I
. The object contains
recognized text, text location, and a metric indicating the confidence of the recognition
result.
Note
The ocr
function also recognizes seven-segment digits
from images.
[___] = ocr(___,
specifies options using one or more name-value arguments in addition to any combination of
arguments from previous syntaxes. For example, Name=Value
)Language="english
", sets
English as the language to detect.
Examples
Recognize Text Within an Image
businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard)
ocrResults = ocrText with properties: Text: '‘ MathWorks®...' CharacterBoundingBoxes: [103x4 double] CharacterConfidences: [103x1 single] Words: {16x1 cell} WordBoundingBoxes: [16x4 double] WordConfidences: [16x1 single]
recognizedText = ocrResults.Text;
figure;
imshow(businessCard);
text(600,150,recognizedText,"BackgroundColor",[1 1 1]);
Recognize Text in Regions of Interest (ROI)
Read image.
I = imread("handicapSign.jpg");
Define one or more rectangular regions of interest to recognize text within input image.
roi = [360 118 384 560];
You may also use IMRECT
to select a region using a mouse.
For example, figure;imshow(I); roi = round(getPosition(imrect))
.
ocrResults = ocr(I,roi);
Insert recognized text into the original image.
Iocr = insertText(I,roi(1:2),ocrResults.Text,AnchorPoint="RightTop",FontSize=16);
figure;
imshow(Iocr);
Recognize Digits from Seven-Segment Display
Read an image containing the seven-segment display into the workspace.
I = imread("sevSegDisp.jpg");
Specify the ROI that contains the seven-segment display.
roi = [506 725 1418 626];
To recognize the digits from the seven-segment display, specify the Language
argument as "seven-segment"
.
ocrResults = ocr(I,roi,Language="seven-segment");
Display the recognized digits and detection confidence.
fprintf("Recognized seven-segment digits: ""%s""\nDetection confidence: %0.4f",cell2mat(ocrResults.Words),ocrResults.WordConfidences)
Recognized seven-segment digits: "5405.9" Detection confidence: 0.7948
Insert the recognized digits into the image.
Iocr = insertObjectAnnotation(I,"rectangle",... ocrResults.WordBoundingBoxes,ocrResults.Words,LineWidth=5,FontSize=72); figure imshow(Iocr)
Display Bounding Boxes of Words and Recognition Confidences
businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard)
ocrResults = ocrText with properties: Text: '‘ MathWorks®...' CharacterBoundingBoxes: [103x4 double] CharacterConfidences: [103x1 single] Words: {16x1 cell} WordBoundingBoxes: [16x4 double] WordConfidences: [16x1 single]
Iocr = insertObjectAnnotation(businessCard,"rectangle", ... ocrResults.WordBoundingBoxes, ... ocrResults.WordConfidences); figure; imshow(Iocr);
Find and Highlight Text in an Image
businessCard = imread("businessCard.png"); ocrResults = ocr(businessCard); bboxes = locateText(ocrResults,"MathWorks",IgnoreCase=true); Iocr = insertShape(businessCard,"FilledRectangle",bboxes); figure; imshow(Iocr);
Input Arguments
I
— Input image
M-by-N-by-3 truecolor image | M-by-N 2-D grayscale image | M-by-N binary image
Input image, specified in M-by-N-by-3 truecolor, M-by-N 2-D grayscale, or binary format. The input image must be a real, nonsparse value. The function converts truecolor or grayscale input images to a binary image, before the recognition process. It uses the Otsu’s thresholding technique for the conversion. For best ocr results, the height of a lowercase ‘x’, or comparable character in the input image, must be greater than 20 pixels. From either the horizontal or vertical axes, remove any text rotations greater than +/- 10 degrees, to improve recognition results.
Data Types: single
| double
| int16
| uint8
| uint16
| logical
roi
— Region of interest
M-by-4 element matrix
One or more rectangular regions of interest, specified as an
M-by-4 element matrix. Each row, M, specifies a
region of interest within the input image, as a four-element vector,
[x
y
width
height]. The vector specifies the upper-left corner location,
[x
y], and the size of a rectangular region of interest,
[width
height], in pixels. Each rectangle must be fully contained within the
input image, I
. Before the recognition process, the function uses
the Otsu’s thresholding to convert truecolor and grayscale input regions of interest to
binary regions. The function returns text recognized in the rectangular regions as an
array of objects.
To obtain best results when using ocr
to recognize
seven-segment digits, specify an roi
enclosing the seven-segment
digits in the input image.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: ocr(I,TextLayout=block)
sets the text layout to
"block".
TextLayout
— Input text layout
"auto"
(default) | "page"
| "block"
| "line"
| "word"
| "character"
Input text layout, specified as one of the following:
TextLayout | Text Treatment |
---|---|
"auto" | Treats the text in the image as a |
"page" | Treats the text in the image as a page containing blocks of text. |
"block" | Treats the text in the image as a single block of text. |
"line" | Treats the text in the image as a single line of text. |
"word" | Treats the text in the image as a single word of text. |
"character" | Treats the text in the image as a single character. |
You can use the TextLayout
argument to determine the layout
of the text within the input image. For example, you can specify
TextLayout
as "page"
to recognize text from
a scanned document that contains a specific format, such as a double column. This
setting preserves the reading order in the returned text.
You may get poor results if your input image contains a few regions of text or the text is located in a cluttered scene. If you get poor OCR results, try a different layout that matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.
Language
— Language to recognize
"english"
(default) | "japanese"
| "seven-segment"
| character vector | string scalar | cell array of character vectors | string array
Language to recognize, specified as "english"
,
"japanese"
, "seven-segment"
, character vector,
string scalar, string array, or as a cell array of character vectors.
If you specify the Language
as
"seven-segment"
, the ocr
function
recognizes seven-segment digits in the input image.
You can also install the Install OCR Language Data Files package for additional languages or add a custom language. Specifying multiple languages enables simultaneous recognition of all the selected languages. However, selecting more than one language may reduce the accuracy and increase the time it takes to perform ocr.
To specify any of the additional languages which are contained in the Install OCR Language Data Files package, use the language character vector the same way as the built-in languages. You do not need to specify the path.
txt = ocr(img,Language="finnish");
List of Support Package OCR Languages
"afrikaans"
"albanian"
"ancientgreek"
"arabic"
"azerbaijani"
"basque"
"belarusian"
"bengali"
"bulgarian"
"catalan"
"cherokee"
"chinesesimplified"
"chinesetraditional"
"croatian"
"czech"
"danish"
"dutch"
"english"
"esperanto"
"esperantoalternative"
"estonian"
"finnish"
"frankish"
"french"
"galician"
"german"
"greek"
"hebrew"
"hindi"
"hungarian"
"icelandic"
"indonesian"
"italian"
"italianold"
"japanese"
"kannada"
"korean"
"latvian"
"lithuanian"
"macedonian"
"malay"
"malayalam"
"maltese"
"mathequation"
"middleenglish"
"middlefrench"
"norwegian"
"polish"
"portuguese"
"romanian"
"russian"
"serbianlatin"
"slovakian"
"slovenian"
"spanish"
"spanishold"
"swahili"
"swedish"
"tagalog"
"tamil"
"telugu"
"thai"
"turkish"
"ukrainian"
To use your own custom languages, specify the path to the trained data file as the
language character vector. You must name the file in the format,
<language>.traineddata
. The file must be
located in a folder named tessdata
. For
example:
txt = ocr(img,Language="path/to/tessdata/eng.traineddata");
txt = ocr(img,Language={"path/to/tessdata/eng.traineddata",... "path/to/tessdata/jpn.traineddata"});
traineddata
files in
the cell array are contained in the folder ‘path/to/tessdata
’.
Because the following code points to two different containing folders, it does not
work.
txt = ocr(img,Language={"path/one/tessdata/eng.traineddata",... "path/two/tessdata/jpn.traineddata"});
traineddata
file must also exist in the same folder as the Hindi
traineddata
file. The ocr
only supports
traineddata
files created using tesseract-ocr
3.02 or using the OCR Trainer.
For deployment targets generated by MATLAB®
Coder™: Generated ocr executable and language data file folder
must be colocated. The tessdata
folder must be named
tessdata
:
For English:
C:/path/tessdata/eng.traineddata
For Japanese:
C:/path/tessdata/jpn.traineddata
For Seven-segment:
C:/path/tessdata/seven_segment.traineddata
For custom data files:
C:/path/tessdata/customlang.traineddata
C:/path/ocr_app.exe
You can copy the English, Japanese and Seven-segment trained data files from:
fullfile(matlabroot,"toolbox","vision","visionutilities","tessdata");
CharacterSet
— Character subset
""
all characters (default) | character vector | string scalar
Character subset, specified as a character vector. By default,
CharacterSet
is set to the empty character vector,
""
. The empty vector sets the function to search for all
characters in the language specified by the Language
property.
You can set this property to a smaller set of known characters to constrain the
classification process.
The ocr
function selects the best match from the
CharacterSet
. Using deducible knowledge about the characters in
the input image helps to improve text recognition accuracy. For example, if you set
CharacterSet
to all numeric digits,
"0123456789"
, the function attempts to match each character to
only digits. In this case, a non-digit character can incorrectly get recognized as a
digit.
If you specify the Language
as
seven-segment
, the ocr
function uses
the CharacterSet
, "0123456789.:-"
.
Output Arguments
txt
— Recognized text and metrics
ocrText
object
Recognized text and metrics, returned as an ocrText
object. The
object contains the recognized text, the location of the recognized text within the
input image, and the metrics indicating the confidence of the results. The confidence
values range is [0 1] and represents a percent probability. When you specify an
M-by-4 roi
, the function returns
ocrText
as an M-by-1 array of ocrText
objects.
If your ocr
results are not what you expect, try one or
more of the following options:
Increase the image 2-to-4 times the original size.
If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.
Use binarization to check for non-uniform lighting issues. Use the
graythresh
andimbinarize
functions to binarize the image. If the characters are not visible in the results of the binarization, it indicates a potential non-uniform lighting issue. Try top hat, using theimtophat
function, or other techniques that deal with removing non-uniform illumination.Use the region of interest
roi
option to isolate the text. Specify theroi
manually or use text detection.If your image looks like a natural scene containing words, like a street scene, rather than a scanned document, try using an ROI input. Also, you can set the
textLayout
argument to"block"
or"word"
.
Limitations
The Seven-Segment language cannot be combined with other languages. For example, this syntax is not supported:
ocr(I,Language=["english","seven-segment"])
References
[1] R. Smith. An Overview of the Tesseract OCR Engine, Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2 (2007), pp. 629-633.
[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).
[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
'TextLayout'
,'Language'
, and'CharacterSet'
must be compile-time constants.Generated code for this function uses a precompiled platform-specific shared library.
Version History
Introduced in R2014a
See Also
Apps
Functions
Objects
MATLAB 命令
您点击的链接对应于以下 MATLAB 命令:
请在 MATLAB 命令行窗口中直接输入以执行命令。Web 浏览器不支持 MATLAB 命令。
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)