Step 4 - Select how much automation you need?
You might want to add automation to your document imaging system but with automation comes additional setup costs and IT support requirements. In other words automation comes with a price.
It is important to evaluate your records to see if automation will work with enough accuracy to save time and money. It doesn't make sense to save a $10 and hour employee's time and replace it with a $50 an hour IT professional's time. Accuracy and time savings are the criteria necessary to add automation to your system.
Bar Code reading is a wonderful upgrade to a document imaging system because you can automatically reference a database table or trigger events such as document separation with a bar code.
Bar codes come in different sizes and shapes. Each bar code requires certain parameters to read it accurately. For example, some bar codes are unreadable if there is a line through it while others can be read accurately even though they have marks across them. To learn more about bar codes you can refer to the following page of our website.
One advantage of bar codes is that they are all or nothing when it comes to reading them. They are ideal if you generate a lot of documents internally but have limited application with documents that originate outside the organization.
Bar code technology is the closest thing to just loading the hopper in the automatic document feeder, pressing a button and letting the technology do the rest.
Optical Character Recognition (OCR) is a process of converting text on a scanned image into text that can be searched. The decision to OCR versus labeling a document should revolve around the issue of data mining versus document retrieval. In order for documents and/or information contained within those documents to be searched, they must be indexed. There are document imaging companies that advocate that customers OCR a document for search purposes, simply because it automates the process.
We believe the decision to use one method over the other needs to be made on a document-by-document basis. To understand our position, you must first understand the drawbacks associated with each process.
While search engines (indices) are easy to use, the results they return are all or nothing affairs. When you enter a word that describes what you are looking for you can be bombarded with endless lists of results that may or may not be relevant. For example, searching for "java" would return documents that describe java the programming language, java the coffee, and Java the Indonesian island. When doing full text searches, the greater the number of documents on your server, the greater the number of irrelevant search results.
Labeling documents provides the highest level of accuracy to your search results because it searches only the key words and numbers that you have assigned to a document.
OCR is Not 100% Accurate:
Optical Character Recognition (OCR) is at best an inaccurate process. The idea behind OCR is to execute a "full-text search" on the document with key words and phrases that are known to be included in a document.
The OCR process is extremely sensitive to the quality of the image as well as the differences in the fonts used within the document. As a result, the output from an OCR process is seldom totally correct. If the OCR process claims to be 95% accurate, that means that one character in 20 is not recognized. Errors are introduced when characters "bleed" and touch one another or when the scanner picks up "ghost" images from the reverse side of a document. OCR software invariably substitutes "1" for "I" and "e" for "c". Because of the inaccuracy inherent in the OCR process, it requires and operator to manually correct all the suspect characters.
If you intend to use OCR in place of labeling a document because the process is automated, the need to correct suspect characters negates any time savings gained by the automated process. Furthermore, you still have the problem of capturing irrelevant OCR documents in a search because the keyword is included within the document.
You may encounter companies that propose that it is not necessary to correct the suspect characters and you will usually find the document you are looking for. They suggest that you use "fuzzy search" technology. This technology expands queries to include terms that sound like or are typographically similar to the term requested. The problem with fuzzy searches is that they produce an even larger number of documents that are irrelevant.
When We Would Recommend to OCR:
We recommend you OCR a document when you will need to conduct an intra-document search. This is especially useful if you are looking for information contained within a document. An example would be an attorney looking for a statement contained in a deposition or the need to find information contained within a research paper.
By searching within a document you can jump directly to the specific information needed within the document. One will basically navigate from occurrence to occurrence of the word, starting with the first "hit". Each "hit" is highlighted within the document, thus making the search process much more efficient, and enabling users to retrieve and use information faster, in fewer steps.
Which method should you use, should you label or OCR a document? Which process you use will depend on your own particular situation. We would suggest that you normally label a document, and when needed, OCR those documents that will be subject to an intra-document search.
Zone OCR allows you to only OCR a small area of a document and then transfer that information into an index. It is beneficial when you have structured documents because the zone of information to OCR is consistently in the same place.
Like document OCR, zone OCR isn't a hundred percent accurate and requires careful quality control following the process. The level of accuracy will vary by the quality of the document and the accuracy of the paper passing through the scanner. If a document becomes skewed as it passes through the scanner, it will affect the accuracy of the OCR results.
Zone OCR requires a significant amount of IT time to setup and tweak the system to make it worth while. One should test your documents by asking the vendor to run a sample of your documents through their system. If you get a 70% - 85% accuracy rate you are better off not purchasing this feature.
This is certainly an area of buyer beware! Most vendors demo this feature using pristine documents that they know will pass the accuracy test when they are processed through the system. Furthermore, you wouldn't want to purchase a zone OCR system that wasn't either supported by the person selling the system or if you didn't have in-house IT staff available to manage and maintain the system.
Match and merge technology is used with manually entering information into the index fields. With this technology when you complete one index field, one or more additional fields are completed by extracting the information from a database table. This technology allows you to connect your document imaging system to a database, such as a customer database to minimize the number of keystrokes required to complete the index fields.
When you combine match and merge with a self completion feature, you can quickly complete index fields with a couple of keystrokes. It speeds up the indexing process so that unless you were indexing more than 1000 documents a day, it becomes questionable if one can justify the additional cost of bar code or zone OCR automated technologies.
This technology matches the information that you type into an index field against a database table or a list. Once you have typed in a unique string of characters, the field will self complete. This feature makes manual data entry of index fields fast and accurate.
You can usually complete an index field using this feature in a matter of a 1-3 keystrokes.