A simple guide for how to scan a document using Microsoft Office Document Imaging and convert it to text format using OCR.
Microsoft Document Imaging
Microsoft Office 2007 includes two tools for scanning and converting documents, and they are called Microsoft Document Scanning and Microsoft Document Imaging. With these tools, you can take a document such as a printed form, scan it as an image and then use OCR to convert that image into text so that you can edit the file in an Office application like Word. You can edit the document or save a computerized form of it without having to retype the whole thing. This article shows you how to perform Microsoft Office document imaging OCR text conversion.
What is OCR?
OCR stands for Optical Character Recognition. OCR recognizes letters, numbers and punctuation within images in order to convert them to text format. The technology has been around for years and is widely used to convert paper documents into digitized ones that can be stored and transferred more easily and safely.
When scanning documents into Office, be careful of documents such as legal papers that have stamps or signatures on them, as they can interfere with the OCR process and result in weird characters on screen. You should always proofread the OCR copy to make sure all the characters are recognized properly, especially if dealing with damaged papers or a worn copy of a copy where the type is not very clear.
Scan the Document
If you already have the scanned document ready in TIFF format, just go to File - Open and load the file into Microsoft Document Imaging.
In order to scan the file you need, just open Microsoft Document Imaging and go to File - Scan New Document to bring up the Scan New Document window. Place the item(s) you want scanned on your scanner, hit the Scan button and wait for them to show up in the program window. Depending on the type of scanner you have, you may be able to do more than one page at a time.
If you are using Office 2010, be sure to read our article on Document Imaging with Office 2010.
Convert to Text Using OCR
Once your document is showing in Microsoft Document Imaging, it is ready to be converted using OCR so you can open it in Word and edit the text. To convert the document, go to Tools - Send Text to Word and it will bring up a window for converting the file using OCR. If you have graphics in the document, check the box to preserve them. Otherwise, double-check the file location to where the converted text is being sent, and click OK. The hourglass icon will show for a few seconds depending on how large the file is, and then a Microsoft Word window will pop up with your newly converted document.
You will often find that some of the formatting is lost during the conversion process, but that is how it is supposed to work. The idea is to convert the image into a text format so that you can rebuild the document however you want. If you just want to make small tweaks to existing files, you may consider getting Adobe Acrobat or use an open source PDF editor like OpenOffice.