High speed OCR and parallel processing
3 次查看(过去 30 天)
显示 更早的评论
Hello,
I've written a number of programs to read tabular data in images using Matlab's ocr function. I have cleaned up the image files before using OCR (binarize, etc.). However it is taking about ~4 secs per 100 rows of single column data. Unfortunately I have hundreds of thousands of rows to work with so I need a way to speed this up. Using ROI or cropping the image into individual table cells didn't make much difference. Can someone help by pointing out some options?
- Is there a way to make OCR run faster?
- I have seen some documentation on parallel processing and was wondering if that could help. My computer has 4 cores. Should I explore the following?
- hyperthreading
- increase number of workers more than the number of cores
- increase number of threads per worker.
In essence I'm looking to split the hundreds of image files to be processed separately and want to maximise the speed.
Thank you.
0 个评论
采纳的回答
Walter Roberson
2017-1-29
To get ocr() to maybe run faster you would need to train a custom network. This assumes that fonts and handwriting sloppiness are more restricted for your situation (eg. one font of one size) ; if you have a general written OCR problem then training your own network is not likely to speed anything up.
The general task of OCR could, I suspect, be done faster using different algorithms. I say that thinking about the speed of the automatic mail sorters. On the other hand those do not have to deal with hundreds of rows.
You need to profile your code. Hyperthreading is an advantage if you are waiting on I/O. If you are busy with computations then Hyperthreading can slow things down. Assigning more threads than cores or more workers than cores leads to contention for resources unless they typically spend a lot of time waiting for interrupts.
parfor and SPMD are not always more productive. They are most effective for low IO high computation where the matrices involved are small or moderate and you do not do extensive tasks such as eigenvalues or \ operation. With larger matrices and vectorized code especially code that does linear algebra then you would typically get better performance leaving it not explicitly parallel so that it can use the multithreaded high performance libraries (those have much lower overhead than creating workers)
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Startup and Shutdown 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!