Install:
(Ubuntu 9.10)
sudo apt-get install tesseract-ocr tesseract-ocr-eng
Preparing Images for Tesseract with GIMP:
- Load an image with text into GIMP
- Image > Mode > make the image RGB or Grayscale.
- Tools > Color Tools > Threshold > pick a value which best shows the text
- Image > Mode > Indexed > choose 1-bit & no dithering.
- Save the image as .tif (TIFF but make sure the extension is only .tif and not .tiff)
Use:
The input file MUST be .tif (not .tiff) and the output will be a .txt (extension is automatically added by tesseract) so simply typing the following should export the input image (input.tif) as a text file (output.txt).
tesseract input.tif output
Example:
Example Output:
Hello World
Alex Sleat
Testing test
All this information was found on the following Ubuntu documentation, I just had some trouble finding it so I have re-wrote as a clean simple guide for anyone else having the same trouble. All thanks goes to whoever wrote the following link. :)
https://help.ubuntu.com/community/OCR
1 reply on “HowTo: Simple Tesseract Usage Guide (OCR)”
Thanks for this. I installed and got it working with a sample image.
I have a question: If I wanted to pass an image to tesseract from inside a C program, how would I go about doing that ?
do you have any sample program like that.? (I am not very experienced in setting up compilation environment and such but I use Microsoft Visual Express environment..so maybe if you can tell me what libraries and include file I may be able to do).
I appreciate your help in advance. Thank you very much.
Best Regards,
Iyer.