How to get better OCR results (without confusing digits for letters)

Question

Carl Youel 2021-7-6

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/873128-how-to-get-better-ocr-results-without-confusing-digits-for-letters

评论： Carl Youel 2021-7-6

采纳的回答： Image Analyst

在 MATLAB Online 中打开

Hello all,

I'm trying to use OCR to determine the axes scale on a graph:

(I want to be able to extract the numbers "0, 32000, 4000, etc." on the y-axis, and "-50, 50, 150, etc." on the x-axis)

My initial attempt is this code:

detect = ocr(justAxes, 'TextLayout', "Block");
Iocr         = insertObjectAnnotation(justAxes, 'rectangle', ...
                           detect.WordBoundingBoxes, ...
                           detect.Words + " " + detect.WordConfidences);
figure; imshow(Iocr);
words_string = detect.Words;

Which gives me this result:

The results aren't bad, but I'm wondering if there is any preprocessing I can do to avoid the OCR misreading digits as letters (e.g. the '50' as 'so', the '8000' as 'sooo', and to '0' as 'o'). Can I somehow tilt the OCR to detect digits more than it detects letters? Or do I have to preprocess the image further in some way?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Image Analyst 2021-7-6

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/873128-how-to-get-better-ocr-results-without-confusing-digits-for-letters#answer_741148

You need to have your digits be at least 20 pixels high, as stated in the help. I also had trouble with some that where the image chunk I gave it had the numbers that were only 10 or 12 pixels high and while a human could tell what they were, the ocr() function was misidentifying the numbers. I called imresize() on each image chunk to make the image 20 pixels high and then it properly identified the number. If that doesn't work, write back and attach your code and image.