Skip to content

Optical character recognition (OCR)

Recognise print text in images with any of these tools.

ABBYY FineReader

ABBYY FineReader is considered very good for OCR in many languages. However, it is not cheap.

Adobe Acrobat

Adobe Acrobat has OCR capabilities for PDFs. However, not all languages are supported (e.g. Arabic).

Tesseract

Tesseract is open source and free to use. It has support for many languages, can be (re)trained for other languages and achieves good results. It requires a little knowledge of the command line and supports only a limited number of image formats for inputs.

OCRmyPDF

For PDFs without a text layer, OCRmyPDF is a very useful toolkit. It is a command-line tool that uses Tesseract under the hood, as well as various preprocessing tools to straighten (deskew) the text lines and removing noise – if you want it to.

Kraken / Ocropy

Kraken and Ocropy are related OCR applications, as they build on one another. They are both open source and free to use. Kraken is relatively easy to train. Instead of full pages, Kraken recognises text in single lines.

Calamari

Calamari is also open source and free to use.


Last update: <<<<<<< HEAD 2021-07-20 ======= 2022-12-13 >>>>>>> 8480c86cb981052994f4c719042109a009abe0fa