HowTo: Simple Tesseract Usage Guide (OCR)

Install:

(Ubuntu 9.10)

sudo apt-get install tesseract-ocr tesseract-ocr-eng

Preparing Images for Tesseract with GIMP:

  1. Load an image with text into GIMP
  2. Image > Mode > make the image RGB or Grayscale.
  3. Tools > Color Tools > Threshold > pick a value which best shows the text
  4. Image > Mode > Indexed > choose 1-bit & no dithering.
  5. Save the image as .tif (TIFF but make sure the extension is only .tif and not .tiff)

Use:

The input file MUST be .tif (not .tiff) and the output will be a .txt (extension is automatically added by tesseract) so simply typing the following should export the input image (input.tif) as a text file (output.txt).

tesseract input.tif output

Example:



Example Output:
Hello World
Alex Sleat
Testing test
All this information was found on the following Ubuntu documentation, I just had some trouble finding it so I have re-wrote as a clean simple guide for anyone else having the same trouble. All thanks goes to whoever wrote the following link. :)

https://help.ubuntu.com/community/OCR


Related posts:

  1. Problems Installing OpenCV in Ubuntu Karmic (9.10)
  2. Howto: Reinstall Grub using Ubuntu Live CD
  3. HowTo: Install Player/Stage in Ubuntu Karmic (9.10)
This entry was posted in HowTo and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>