Why ocr function doesn't recognize the numbers?

25 次查看(过去 30 天)
Hi,
I have the image below:
I need to capture the numbers through ocr function. Thus, I use this code to do it:
capture = imread('Capture.png');
my_image = imresize(capture, 1.4);
ocrResults = ocr(my_image,'CharacterSet','.0123456789');
recognizedText = ocrResults.Words;
However, ocr function only recognize some numbers. In fact, the output is a cell array 37x1 while I should have 39 rows:
{'18.33'}
{'1423' }
{'1' }
{'6.55' }
{'1' }
{'5.65' }
{'12.54'}
{'14.77'}
{'10.33'}
{'13.79'}
{'12.94'}
{'1255' }
{'1' }
{'1.70' }
{'9.84' }
{'10.71'}
{'9.74' }
{'9.98' }
{'933' }
{'9.00' }
{'7.22' }
{'3.02' }
{'7.45' }
{'7.10' }
{'6.56' }
{'6.28' }
{'5.86' }
{'5.40' }
{'5.01' }
{'4.57' }
{'4.10' }
{'174' }
{'3.39' }
{'3.011'}
{'2.71' }
{'2.33' }
{'2.118'}
Then, many numbers are worng. Please, somone can help me? Thanks!

采纳的回答

Birju Patel
Birju Patel 2018-1-17
编辑:Birju Patel 2018-1-17
Hi,
A little bit of pre-processing and using ROIs to specify where the words are will help. By default, OCR uses page layout analysis to determine blocks of text. In this case, the image doesn't look like a normal page of text (like a PDF article for example).
To make it easier for OCR, first you can find the location of the words using regionprops and then pass the location of the words (as bounding boxes) to the OCR function. See the code below and results. They look accurate. You may have to play around more with the pre-processing to make this robust for a collection of different images. But hopefully, this gives you an idea on how to proceed:
capture = imread('Captura.PNG');
% Increase image size by 3x
my_image = imresize(capture, 3);
figure
imshow(my_image)
% Localize words
BW = imbinarize(rgb2gray(my_image));
BW1 = imdilate(BW,strel('disk',6));
s = regionprops(BW1,'BoundingBox');
bboxes = vertcat(s(:).BoundingBox);
% Sort boxes by image height
[~,ord] = sort(bboxes(:,2));
bboxes = bboxes(ord,:);
% Pre-process image to make letters thicker
BW = imdilate(BW,strel('disk',1));
% Call OCR and pass in location of words. Also, set TextLayout to 'word'
ocrResults = ocr(BW,bboxes,'CharacterSet','.0123456789','TextLayout','word');
words = {ocrResults(:).Text}';
words = deblank(words)
words =
39×1 cell array
{'0' }
{'18.33'}
{'0' }
{'14.23'}
{'0' }
{'16.55'}
{'0' }
{'15.65'}
{'12.64'}
{'14.77'}
{'10.83'}
{'13.79'}
{'12.94'}
{'12.55'}
{'0' }
{'11.70'}
{'9.84' }
{'10.71'}
{'9.74' }
{'9.98' }
{'9.33' }
{'9.00' }
{'7.22' }
{'8.02' }
{'7.45' }
{'7.10' }
{'6.56' }
{'6.28' }
{'5.86' }
{'5.40' }
{'5.02' }
{'4.57' }
{'4.10' }
{'3.74' }
{'3.39' }
{'3.00' }
{'2.71' }
{'2.33' }
{'2.08' }
  4 个评论

请先登录,再进行评论。

更多回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by