Effective Date: 
Friday, August 26, 2016
Revision Number: 
1
Revision Date: 
Friday, August 26, 2016
Last Reviewed: 
Friday, August 26, 2016

1. Introduction

The general layout of the form is very important to both human users and the scanning recognition system. The general principle is to design a form that is user-friendly and at the same time includes many of the requirements for making the forms compatible with image-based recognition systems’ requirements.

There are three steps towards good form design:

  • Deciding What to Capture
  • Designing the Physical Layout of the Form
  • Designing the Data-Entry Fields

A well-designed form speeds up and improves the reliability of any Optical Character Recognition (OCR)/Intelligent Character Recognition (ICR) process so, naturally, achieving efficient data recognition begins with correctly designing a form.

A form works well if it has the following characteristics:

  • It is easy for the user to fill out
  • It uses as few methods as possible for collecting the information. Some examples of different types of methods are multiple choice questions, yes/no questions, constrained answers, unconstrained answers, etc.
  • The data fields are clearly defined to encourage answers that are correctly formatted
  • The instructions are written in clear, simple language

Most agencies implement an automated forms processing solution to reduce labor costs. More accurate OCR/ ICR means more data captured without the need for human intervention resulting in greater cost savings.

Recognition of hand-printed forms can be challenging. ICR (hand-printing recognition) engines work best when agencies can design their own forms. Form design can be vital to ICR accuracy. In some cases, a properly redesigned form will result in the elimination of virtually all ICR errors while reducing the number of characters that require verification.

Over the years, certain form design practices have proven to yield the best results. A properly designed form encourages the user to fill out the form completely and accurately. Forms should be visually appealing and well organized. The form must be easy to fill out and allow the user to write normally where possible. But the most critical element to ICR accuracy is the separation of characters. A good form keeps users from running characters together.

2. Deciding What to Capture

The first phase in designing a form for OCR/ICR processing is to decide what data needs to be captured from it.  The following steps should be taken:

  • Identify the fields that will require OCR/ICR
  • List the fields by field name and identify the number of characters required for each field

Note: A date field should have six to eight characters representing day, month, and year. For other fields like phone and fax number and postal/ZIP codes, there are an exact number of characters for each field. For name and address fields, use the maximum number of characters that you would expect to see in that field.

  • Group the fields you have listed according to function or type. A good form will contain as many yes/no and multiple choice questions as possible.
  • Decide on the headings for each group of data on the form.
  • Write out the instructions and examples you want to use in the form:
    • Place long or detailed instructions on the back of the form or in a separate document, and provide a simple direction to the instructions on the face of the form.
    • Keep the instructions and examples simple and use plain language; a short instruction such as “PLEASE PRINT USING BLOCK CAPITAL LETTERS” can prove invaluable as well as something like “PLEASE PRINT WITH BLACK INK”.
    • If space allows, it is also useful to give a small example of a correctly filled in field.

3. Designing the Data-entry Fields

Characters are naturally grouped in fields. A field is a set of data located in a particular region of the form that is to be read as a whole, such as a name or a telephone number. The length of the field is determined by the number of characters contained in it. In a well- designed form, the data fields are clearly defined to encourage answers that are correctly formatted. In addition, the better the system can locate and identify meaningful data in the image, the faster and more accurately it can read the data.

To make the fields easy to locate:

  • A minimum of 1/4" (6.4 mm) clear space should be left around the data.
  • A field label should be included.
  • As many constraints as possible should be used to guide the user to enter the data correctly.
  • Formats for dollar amounts, dates, and times should be specified. Check boxes should be used for multiple choice selections or to indicate that a given item is relevant.
  • Create separate fields wherever possible. For example, City, State, and ZIP Code should be 3 separate fields.

Check Boxes

Check boxes can be used for multiple choice selections or to indicate that a given item is relevant. The recognition system uses "mark sense recognition" to determine whether the box has been checked. The system treats any data within the mark sense box as a "yes" response. Therefore, the user can indicate a choice by filling in the entire box or simply marking with an 'X' or a check mark. A check box can be almost any size, and can be used for applications such as checking an option or verifying that a signature is present. A well designed form will contain as many yes/no or multiple choice questions as possible. If space allows, it is worth giving a sample of a check box filled out with an ‘X’, since this is preferable to a tick, which can easily stray into neighboring boxes.

Guidelines in designing check boxes

  • The white space within the check box must be large enough to provide clear and accurate marks.
  • At least 1/4" (6.4 mm) should be left between check boxes to prevent overlapping of marks destined for one box getting into another.
  • Clear instructions and examples should be provided to show the user how to fill in the boxes correctly and distinctly. It is recommended to print an X inside the check boxes as a guide for the user so that he or she knows not to circle the box. An example: “Mark the appropriate box with an X”.

Character Fields

Field constraints are lines or boxes in a form to guide (or constrain) the user in entering data. They ensure that the data is in the correct location, is formatted correctly, and does not overlap other data.

Because individual handwriting varies so widely, the more constraint you impose on the user, the more likely the characters will be distinct and consistent.

Forms to be filled out with hand-printed information should be designed so that each letter or number is to be written in a specifically designated area.

Individual character boxes are highly recommended.

It is also a good idea to print the character set recommended to give the best results on your form.

The most common types of character fields are Isolated Character Fields, Semi-Constrained Character Fields, and Unconstrained Character Fields.

Isolated Character Fields

An isolated character field is a field type where each character position is clearly defined and is clearly separate from the other characters in that field.

Isolated character fields yield the best results for forms that are to be filled out with hand print. Isolated character fields promote faster processing of characters with higher accuracy.

Recommendations:

  • The size of each character box should be a minimum of 5 x 6 mm. 
  • There should be enough white space between each field to prevent printed characters from overflowing the box boundaries.
  • Each character box should be slightly taller than it is wide as people tend to fill in boxes to the shape, so short wide boxes would produce short wide characters.

 

Semi-Constrained Character Fields

A semi-constrained character field is a field type where each character position is well defined but not necessarily isolated. Semi-constrained character fields typically provide the best results in most practical situations. They are very similar to isolated fields but the potential for each character's printing to leave its own area or "leak" into the next character's area is high. The best results occur when the character boxes are drawn separated from one another, just as with an isolated field. It is also possible to draw the character boxes so that they are touching, although the resulting accuracy may not be as precise.

Unconstrained Character Fields

An unconstrained character field is a field which does not contain any lines or boxes restricting the position of each character entered. Unconstrained fields are more difficult to recognize and require more processing time, but are invaluable where field designs cannot be controlled. Hand, machine, numeric, and alpha fields can be unconstrained although the best results are realized for numeric fields. Alpha fields where characters are broken or touching are the most difficult to recognize. For the best design of unconstrained fields, ample white space should be provided for the field with no lines binding the field. If the field is to be delineated (e.g., the courtesy amount on a check), the white space for the field should be surrounded with borders printed in drop-out color.

  • Signature fields – should be large enough, and positioned so the user is not forced to mark nearby fields when signing the document.

Multiple Page Issues

If a form has multiple pages, or is double-sided, it is necessary to include page indicators on each page. This will, in most cases, be a page number.

Recognition can perform pre-recognition on the page number to determine which page is being processed and, therefore, what data to expect.

Recommendations:

  • Page numbers should be placed in exactly the same location on every page of the form.
  • A comfortable margin of white space should be left around the number, about 1/8 inch or 4 mm. The word "page" should not be in this margin.

An alternate page indication method uses rectangles, filled in according to the binary number system, used to signal the recognition system which page is being read.

Recommendations:

  • The indicator should be placed in exactly the same location on every page of the form.
  • A comfortable margin of white space should be left around it, about 1/8 inch or 4 mm.

Internal Use Boxes

Add “internal use” boxes for workers to stamp or comment on the original paper form. This will help insure that workers don’t write in areas that interfere with the OCR fields.

4. Best Practices for Scanning

  • Use a good scanner that will handle the volume of scanning that is required
  • Use a scanner that allows you to drop out one color from your scan to enable reduced storage
  • Scan documents at a resolution of 300 DPI or greater. Optimal resolution will depend on whether the content is machine or hand-printed.
  • Use a file format such as TIFF or PDF depending on your needs.
  • Avoid putting key fields where they are likely to get creased either before or after the form is completed (such as folds to fit into an envelope).
  • Test forms through the entire process of printing, folding, mailing, receiving back in the mail, opening mail, scanning, and data entry prior to signing off on printing or publishing the form.

The following forms design standards must be used for all forms. Exceptions to the standards will be reviewed on an individual basis for automated processes.

Title Block

  1. All forms must be identified with a title block that contains:
    1. The title of the form to identify accurately the function or purpose of the form. Printed in a 12 point Bold Sans serif font.
    2. The name of the agency that is the source or is responsible for the form. Printed in a 10 point Sans serif font.
    3. The state form number (SFN).  Printed in a 10 point Sans serif font.
    4. The revision date of the form.  Printed in a 10 point Sans serif font.
  2. The title block must be placed in the upper left corner of the form, whenever possible.
  3. The Great Seal of the State of North Dakota or agency logo must be part of the title block.
  4. If the Great Seal of the State of North Dakota is not used, the words "North Dakota" must be included in the name of the agency in the title block.
  5. Forms must not be printed or reproduced on letterhead.

Paper and Ink

  1. The standard size paper for state forms is 8 1/2 X 11 inches, and sizes that can be cut from that size with a minimum of waste.
  2. The standard color for state forms is white, unless volume of usage or other factors justify the use of colored paper.
  3. The standard color ink for state forms is black. Only one color of ink will be used on a form unless it is determined that using drop-out ink is beneficial.
  4. The standard weight of the paper should be twenty pound.  
  5. All state forms must be readily and clearly reproducible on copy machines and scanners.
  6. If the printing process for a form requires collating or padding, all parts of the form will be on the same size paper.
  7. Forms for senior citizens and persons with visual disabilities will be printed on matte finished paper with readable type style in black ink.

Captions

  1. Captions must be brief, clear, and concise.
    1. A caption must only cover one item or point.
    2. Captions must be worded to avoid confusion.
  2. Forms must be designed in a box format with upper left captions.
    1. Type size must be 8-point or larger, where appropriate.
    2. Type style must be sans serif, regular weight.
      1. Bold type may be used for headings, but not for captions.
      2. Italic type may be used for instructions, but not captions.
      3. Script or cursive type style must not be used on any form.
    3. Type must be in lower case with only appropriate capitalization.

Spaces

  1. Standard vertical spacing (throw) on forms is:
    1. Six lines per inch, or equal increments thereof.
    2. Uniform layout over the entire form.
  2. Standard horizontal spacing (pitch) on forms is:
    1. Determined through forms analysis.
    2. Designed to fit data to be gathered by the form.
    3. Designed to fit the method or equipment used with the form.
    4. Uniform layout over the entire form.
  3. Using White Space includes:
    1. A margin of at least ¼” around the entire form should be provided.  A margin of ½” is recommended.
  4. The cornerstones should be at least ½” away from the edge of the paper. The cornerstones should be ⅜” away from other marks on the form.
  5. Specified areas for placing endorsement stamps, initials, or signatures should be kept as far away from recognition fields as possible.
  1. Routine space requirements (Date, SSN, Telephone, etc.) are defined within Space Requirements for Forms.

Appearance

  1. All State of North Dakota forms must have a professional appearance.
    1. No decorations or embellishments
    2. No more than two type styles on a form
    3. Shading or screening are not to be used for decorative purposes
  2. Forms must be simple and easy to read and complete.
    1. Clear, unsmudged black ink
    2. Clear, clean, neat, basic good design
  3. No typographical or grammatical errors.
  4. Adequate "white space" to enhance appearance.
  5. Economical use of paper without excessive white spaces.
  6. No names of any person will be used on any state form.

Barcode/Patch code

  1. Form ID – When forms must be processed with a number of other forms, use a barcode so the system can identify the form. A form ID can be preprinted, stamped, or printed on a sticker and attached upon return of the form.
  2. Data lookup – a barcode can be preprinted on a form so that when the form is returned and scanned, a data lookup can be performed to pull back more information such as Name, Address, etc., thus eliminating the need to key or setup OCR on these fields.
  3. Patch codes – These can be printed on a separate sheet from the form to allow for form separation when adding a barcode to the form is not possible.
  4. 2-D/PDF 417 – A barcode that can be added to a PDF fillable form. The barcode can capture the data from the fields and store it in the barcode. Upon return of the form, the barcode can be read and eliminate the need to manually enter or setup OCR on the data fields.

Fonts

  1. For data typed or machine printed by the user, a simple font at least 10 points in size should be used.
  2. Fixed space fonts such as Courier, OCR B, OCR A, or Arial are examples of excellent choices for scanning. They are fixed pitch fonts meaning each character occupies exactly the same horizontal space regardless of its actual width.

Other Considerations

  1. Drop-out ink allows inclusion of information that is visible to the human reader but is not needed for the recognition process. The information printed in a drop-out color is removed during the scanning process. The use of drop-out ink should be considered for high volume, pre-printed forms.
  2. Shading – Avoid using shading or solid color blocks as they will increase the size of the scanned image.

It is essential that the owner of the form, the form designer, and the IT group work together in the form design process. Without cooperation, the final form may not be optimized both for the user to fill out and for the automated extraction of data.

Forms that are not designed to meet these guidelines will have a high chance of error. If it is decided that forms that don’t meet these guidelines must be used then the solution will be a capture profile.