A photographed or scanned page containing text will only support photo editing such as brightness and contrast adjustment, resizing, rotating, etc. Even though the human eye can read it, the text in an image is not recognized by a word processor and therefore it cannot be edited on the computer. The solution is OCR (Optical Character Recognition). As the name suggests, OCR is the process of turning a photographed or scanned text into editable text by recognizing each character in the image.
The best way to get a perfect image for OCR is a scanner, preferably a document scanner. However, for those who haven’t got one, the alternative is a digital camera. The process involves a few simple steps. (Tip: Do not try to OCR handwriting)
1. Take a good photograph of the page(s). One thing to keep in mind when you do that is that OCR programs have great difficulty reading blurred or skewed text. Here are some tips for getting an OCR friendly image:
- Put the paper on a vertical support
- Use a tripod for the camera to give it more stability
- Set a high resolution for the camera
- Make sure you have good light
- Make sure the camera is not too close
- Take several pictures, just in case
2. Transfer the images to your computer and save them as TIF files. Pick the most clear and straight image and do any necessary adjustments (brightness, contrast, sharpening, straightening, removing noise, etc).
3. Download your OCR software. I recommend TopOCR because it is free and it’s specifically designed for digital cameras. It comes integrated with a photo editor, a facility to acquire images directly from your camera and a word processor very much like MS Word.
4. Open your image in TopOCR. If ithe image is still on the camera, upload it by using the Acquire command in the file menu. If you haven’t edited it yet, you can do it at this stage using the TopOCR Image Processor.
Click image for full size
5. Select the areas of the page you want to recognize (optional) and click OCR to start the recognition process. Allow it a few minutes to do the job. When it’s finished, the editable text will appear in a new window.
Click image for full size
6. Save it in a rich text format and check it. Some characters might not have been recognized correctly, in which case you will have to make the changes manually. TopOCR is usually pretty accurate, provided the image is clear and straight.
7. Read the text one more time, compare it with the image and make sure everything is ok before you save the last version. You can save the final result as a PDF, text document, rich text format or HTML.
Tip: TopOCR also has a TextToSpeech function, which can turn the recognized text into a sound file.