Main Content

ocrText

Store OCR results

Description

The ocrText object contains recognized text and metadata collected during optical character recognition (OCR). You can locate text that matches a specific pattern with the locateText function.

Creation

Create an ocrText object or array using the ocr function.

Properties

expand all

Text recognized by OCR, specified as a cell array of characters. The text includes white space and new line characters.

Character bounding box locations and sizes, specified as an M-by-4 matrix. Each row of the matrix is of the form, [x y width height]. The [x y] elements correspond to the upper-left corner of the bounding box. The [width height] elements correspond to the horizontal and vertical size, respectively, of the rectangular region in pixels. The bounding boxes enclose the text found in an image by the ocr function. Widths and heights of bounding boxes that correspond to new line characters have values of zero. Bounding boxes for character modifiers found in languages such as Hindi, Tamil, and Bengali, also have values of zero.

Character recognition confidence, specified as an array. Confidence values are in the range [0, 1]. Interpret each confidence value set by the ocr function, as a probability. The ocr function sets confidence values for spaces between words and new line characters to NaN, as OCR does not explicitly recognize spaces and new line characters. You can use the confidence values to identify the location of misclassified text within the image by extracting characters with low confidence.

Recognized words, specified as a cell array of character vectors.

Word bounding box locations and sizes, specified as an N-by-4 matrix. Each row of the matrix is of the form, [x y width height], and specifies the upper-left corner position and size of a rectangular region in pixels.

Word recognition confidences, specified as a vector of probability values in the range [0,1]. The ocr function sets confidence values for spaces between words and new line characters to NaN, as OCR does not explicitly recognize spaces and new line characters. You can use confidence values to identify the location of misclassified text within the image by extracting words with low confidence.

Recognized text lines, specified as a cell array of character vectors.

Text line bounding box location and size, specified as an N-by-4 matrix. Each row of the matrix is of the form, [x y width height], and specifies the upper-left corner position and size of a rectangular region in pixels.

Text line confidences, specified as a vector of probability values in the range [0,1]. Use confidence values to identify the location of misclassified text lines within the image by extracting text lines with low confidence.

Object Functions

locateTextLocate text pattern

Examples

collapse all

Load an image containing text into the workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard);
bboxes = locateText(ocrResults,"Math",IgnoreCase=true);
Iocr = insertShape(businessCard,"FilledRectangle",bboxes);
figure
imshow(Iocr)

Load an image containing text into the workspace.

     businessCard = imread("businessCard.png");
     ocrResults   = ocr(businessCard);
     bboxes = locateText(ocrResults, "www.*com","UseRegexp", true);
     img    = insertShape(businessCard, "FilledRectangle", bboxes);
     figure
     imshow(img)

Extended Capabilities

Version History

Introduced in R2014a