The general layout of the form is very important to both human users and the scanning recognition system. The general principle is to design a form that is user-friendly and at the same time includes many of the requirements for making the forms compatible with image-based recognition systems’ requirements.
There are three steps towards good form design:
A well-designed form speeds up and improves the reliability of any Optical Character Recognition (OCR)/Intelligent Character Recognition (ICR) process so, naturally, achieving efficient data recognition begins with correctly designing a form.
A form works well if it has the following characteristics:
Most agencies implement an automated forms processing solution to reduce labor costs. More accurate OCR/ ICR means more data captured without the need for human intervention resulting in greater cost savings.
Recognition of hand-printed forms can be challenging. ICR (hand-printing recognition) engines work best when agencies can design their own forms. Form design can be vital to ICR accuracy. In some cases, a properly redesigned form will result in the elimination of virtually all ICR errors while reducing the number of characters that require verification.
Over the years, certain form design practices have proven to yield the best results. A properly designed form encourages the user to fill out the form completely and accurately. Forms should be visually appealing and well organized. The form must be easy to fill out and allow the user to write normally where possible. But the most critical element to ICR accuracy is the separation of characters. A good form keeps users from running characters together.
The first phase in designing a form for OCR/ICR processing is to decide what data needs to be captured from it. The following steps should be taken:
Note: A date field should have six to eight characters representing day, month, and year. For other fields like phone and fax number and postal/ZIP codes, there are an exact number of characters for each field. For name and address fields, use the maximum number of characters that you would expect to see in that field.
Characters are naturally grouped in fields. A field is a set of data located in a particular region of the form that is to be read as a whole, such as a name or a telephone number. The length of the field is determined by the number of characters contained in it. In a well- designed form, the data fields are clearly defined to encourage answers that are correctly formatted. In addition, the better the system can locate and identify meaningful data in the image, the faster and more accurately it can read the data.
To make the fields easy to locate:
Check boxes can be used for multiple choice selections or to indicate that a given item is relevant. The recognition system uses "mark sense recognition" to determine whether the box has been checked. The system treats any data within the mark sense box as a "yes" response. Therefore, the user can indicate a choice by filling in the entire box or simply marking with an 'X' or a check mark. A check box can be almost any size, and can be used for applications such as checking an option or verifying that a signature is present. A well designed form will contain as many yes/no or multiple choice questions as possible. If space allows, it is worth giving a sample of a check box filled out with an ‘X’, since this is preferable to a tick, which can easily stray into neighboring boxes.
Guidelines in designing check boxes
Field constraints are lines or boxes in a form to guide (or constrain) the user in entering data. They ensure that the data is in the correct location, is formatted correctly, and does not overlap other data.
Because individual handwriting varies so widely, the more constraint you impose on the user, the more likely the characters will be distinct and consistent.
Forms to be filled out with hand-printed information should be designed so that each letter or number is to be written in a specifically designated area.
Individual character boxes are highly recommended.
It is also a good idea to print the character set recommended to give the best results on your form.
The most common types of character fields are Isolated Character Fields, Semi-Constrained Character Fields, and Unconstrained Character Fields.
An isolated character field is a field type where each character position is clearly defined and is clearly separate from the other characters in that field.
Isolated character fields yield the best results for forms that are to be filled out with hand print. Isolated character fields promote faster processing of characters with higher accuracy.
A semi-constrained character field is a field type where each character position is well defined but not necessarily isolated. Semi-constrained character fields typically provide the best results in most practical situations. They are very similar to isolated fields but the potential for each character's printing to leave its own area or "leak" into the next character's area is high. The best results occur when the character boxes are drawn separated from one another, just as with an isolated field. It is also possible to draw the character boxes so that they are touching, although the resulting accuracy may not be as precise.
An unconstrained character field is a field which does not contain any lines or boxes restricting the position of each character entered. Unconstrained fields are more difficult to recognize and require more processing time, but are invaluable where field designs cannot be controlled. Hand, machine, numeric, and alpha fields can be unconstrained although the best results are realized for numeric fields. Alpha fields where characters are broken or touching are the most difficult to recognize. For the best design of unconstrained fields, ample white space should be provided for the field with no lines binding the field. If the field is to be delineated (e.g., the courtesy amount on a check), the white space for the field should be surrounded with borders printed in drop-out color.
If a form has multiple pages, or is double-sided, it is necessary to include page indicators on each page. This will, in most cases, be a page number.
Recognition can perform pre-recognition on the page number to determine which page is being processed and, therefore, what data to expect.
An alternate page indication method uses rectangles, filled in according to the binary number system, used to signal the recognition system which page is being read.
Add “internal use” boxes for workers to stamp or comment on the original paper form. This will help insure that workers don’t write in areas that interfere with the OCR fields.
The following forms design standards must be used for all forms. Exceptions to the standards will be reviewed on an individual basis for automated processes.
It is essential that the owner of the form, the form designer, and the IT group work together in the form design process. Without cooperation, the final form may not be optimized both for the user to fill out and for the automated extraction of data.
Forms that are not designed to meet these guidelines will have a high chance of error. If it is decided that forms that don’t meet these guidelines must be used then the solution will be a capture profile.
An Electronic Document Management System (EDMS) is a collection of technologies that work together to provide a comprehensive solution for managing the creation, capture, indexing, storage, retrieval, and disposition of records and information assets of the organization.