Skip to main content

OCR (Optical Character Recognition) engine.

# To recognize text in an image and save it to 'output.txt' (the '.txt' extension is added automatically):
tesseract <image.png> <output>

# To specify a custom language (default is English) with an ISO 639-2 code (e.g. deu = Deutsch = German):
tesseract -l deu <image.png> <output>

# To list the ISO 639-2 codes of available languages:
tesseract --list-langs

# To specify a custom page segmentation mode (default is 3):
tesseract -psm <0_to_10> <image.png> <output>

# To list page segmentation modes and their descriptions:
tesseract --help-psm

# ---

# To convert an image to PDF (from https://ocrmypdf.readthedocs.io/en/latest/cookbook.html#option-use-tesseract):
tesseract my-image.jpg output-prefix pdf