Optical Character Recognition (OCR) is a process of converting text on a scanned image into text that can be searched. The idea then is to perform a "full-text search" on the OCR document with key words and phrases that are known to be included in a document. The OCR process is sensitive to the quality of the image as well as the differences in the fonts used within the document. As a result, the output from an OCR process is seldom totally correct. Because of the inaccuracy inherent in the OCR process, it requires and operator to manually correct all the suspect characters.
A study was conducted by Imaging Magazine, which analyzed the cost, speed and accuracy of seven of the top scanners, quickly identified the accuracy problem of OCR. Accuracy among the eight systems tested ran between 74% and 94%.
Many people have found that it is faster and more accurate to manually enter predetermined keywords, phrases and numbers to a document than to OCR the document and correct the suspect characters. The flexibility of the labeling method is valued by people with a large number of documents that want accurate searches. The freedom to use characters and numbers when labeling documents allows operators to use existing terminology with which employees are already familiar. This facilitates a quick and easy transition into using digital documents in place of paper documents throughout their organizations.