Ensuring Data Quality Through Input Validation

So far, we have discussed ensuring the effective capturing of data onto source documents and the data’s efficient entry into the system through various input devices. Although these conditions are necessary for ensuring quality data, they alone are not sufficient.

Errors cannot be ruled out entirely, and the critical importance of catching errors during input, prior to processing and storage, cannot be overemphasized. The snarl of problems created by incorrect input can be a nightmare, not the least of which is that many problems take a long time to surface. The systems analyst must assume that errors in data will occur and must work with users to design input validation tests to prevent erroneous data from being processed and stored, because initial errors that go undiscovered for long periods are expensive and time consuming to correct.

You cannot imagine everything that will go awry with input, but you must cover the kinds of errors that give rise to the largest percentage of problems. A summary of potential problems that must be considered when validating input is given in the illustration below.

This Type of Validation	Can Prevent These Problems
Validating Input Transactions	Submitting the wrong data Data submitted by an unauthorized person Asking the system to perform an unacceptable function
Validating Input Data	Missing data Incorrect field length Data have unacceptable composition Data are out of range Data are invalid Data do not match with stored data

Validating Input Transactions

Validating input transactions is largely done through software, which is the programmer’s responsibility, but it is important that the systems analyst know what common problems might invalidate a transaction. Businesses committed to quality will include validity checks as part of their routine software.

Three main problems can occur with input transactions: submitting the wrong data to the system, the submitting of data by an unauthorized person, or asking the system to perform an unacceptable function.

SUBMITTING THE WRONG DATA

An example of submitting the wrong data to the system is the attempt to input a patient’s Social Security number into a hospital’s payroll system. This error is usually an accidental one, but it should be flagged before data are processed.

SUBMITTING OF DATA BY AN UNAUTHORIZED PERSON

The system should also be able to discover if otherwise correct data are submitted by an unauthorized person. For instance, only the supervising pharmacist should be able to enter inventory totals for controlled substances in the pharmacy. Invalidation of transactions submitted by an unauthorized individual applies to privacy and security concerns surrounding payroll systems and employee evaluation records that determine pay levels, promotions, or discipline; files containing trade secrets; and files holding classified information, such as national defense data.

ASKING THE SYSTEM TO PERFORM AN UNACCEPTABLE FUNCTION

The third error that invalidates input transactions is asking the system to perform an unacceptable function. For instance, it would be logical for a human resources manager to update the existing record of a current employee, but it would be invalid to ask the system to create a new file rather than merely to update an existing record.

Validating Input Data

It is essential that the input data themselves, along with the transactions requested, are valid. Several tests can be incorporated into software to ensure this validity. We consider eight possible ways to validate input.

TEST FOR MISSING DATA

The first kind of validity test examines data to see if there are any missing items. For some situations, all data items must be present. For example, a Social Security file for paying out retirement or disability benefits would be invalid if it did not include the payee’s Social Security number.

In addition, the record should include both the key data that distinguish one record from all others and the function code telling the computer what to do with the data. The systems analyst needs to interact with users to determine what data items are essential and to find out whether exceptional cases ever occur that would allow data to be considered valid even if some data items were missing. For example, a second address line containing an apartment number or a person’s middle initial may not be a required entry.

TEST FOR CORRECT FIELD LENGTH

A second kind of validity test checks input to ensure it is of the correct length for the field. For example, if the Omaha, Nebraska, weather station reports into the national weather service computer but mistakenly provides a two-letter city code (OM) instead of the national three-letter city code (OMA), the input data might be deemed invalid, and hence would not be processed.

TEST FOR CLASS OR COMPOSITION

The test for class or composition validity checks to see that data fields that are supposed to be exclusively composed of numbers do not include letters, and vice versa. For example, a credit card account number for American Express should not include any letters. Using a composition test, the program should not accept an American Express account number that includes both letters and numbers.

TEST FOR RANGE OR REASONABLENESS

Validity tests for range or reasonableness are really common-sense measures of input that answer the question of whether data fall within an acceptable range or whether they are reasonable within predetermined parameters. For instance, if a user was trying to verify a proposed shipment date, the range test would neither permit a shipping date on the 32nd day of October nor accept shipment in the 13th month, the respective ranges being 1 to 31 days and 1 to 12 months.

A reasonableness test ascertains whether the item makes sense for the transaction. For example, when adding a new employee to the payroll, entering an age of 120 years would not be reasonable. Reasonableness tests are used for data that are continuous, that is, data that have a smooth range of values. These tests can include a lower limit, an upper limit, or both a lower and an upper limit.

TEST FOR INVALID VALUES

Checking input for invalid values works if there are only a few valid values. This test is not feasible for situations in which values are neither restricted nor predictable. This kind of test is useful for checking responses where data are divided into a limited number of classes. For example, a brokerage firm divides accounts into three classes only: class 1 active account, class 2 inactive account, and class 3 closed account. If data are assigned to any other class through an error, the values are invalid. Value checks are usually performed for discrete data, which are data that have only certain values. If there are many values, they are usually stored in a table of codes file. Having the values in a file provides an easy way to add or change values.

CROSS-REFERENCE CHECKS

Cross-reference checks are used when one element has a relationship with another one. To perform a cross-reference check, each field must be correct in itself. For example, the price for which an item is sold should be greater than the cost paid for the item. Price must be entered, numeric, and greater than zero. The same criterion is used to validate cost. When both price and cost are valid, they may be compared.

A geographical check is another type of cross-reference check. In the United States, the state abbreviation may be used to ensure that a telephone area code is valid for that state and that the first two digits of the zip code are valid for the state.

TEST FOR COMPARISON WITH STORED DATA

The next test for validity of input data that we consider is one comparing it with data that the computer has already stored. For example, a newly entered part number can be compared with the complete parts inventory to ensure that the number exists and is being entered correctly.

SETTING UP SELF-VALIDATING CODES (CHECK DIGITS)

Another method for ensuring the accuracy of data, particularly identification numbers, is to use a check digit in the code itself. This procedure involves beginning with an original numeric code, performing some mathematics to arrive at a derived check digit, and then adding the check digit to the original code. The mathematical process involves multiplying each of the digits in the original code by some predetermined weights, summing these results, and then dividing this sum by a modulus number.

The modulus number is needed because the sum usually is a large number, and we need to reduce the result to a single digit. Finally, the remainder is subtracted from the modulus number, giving us the check digit.

Figure below shows how a five-digit part number for a radiator hose (54823) is converted to a six-digit number containing a check digit. In this example, the weights chosen were the “1-3-1” system; in other words, the weights alternate between 1 and 3. After the digits 5, 4, 8, 2, and 3 were multiplied by 1, 3, 1, 3, and 1, they became 5, 12, 8, 6, and 3. These new digits sum to 34. Next, 34 is divided by the chosen modulus number, 10, with the result of 3 and a remainder of 4. The remainder, 4, is subtracted from the modulus number, 10, giving a check digit of 6. The digit 6 is now tacked onto the end of the original number, giving the official product code for the radiator hose (548236).

Steps in converting a five-digit part number to a six-digit number containing a check digit.

Using Check Digits The check digit system works in the following way. Suppose we had the part number 53411. This number has to be typed into the system, and while that is being done, different types of errors can occur. One possible error is the single-digit miskey; for example, the clerk types in 54411 instead of 53411. Only the digit in the thousands place is incorrect, but this error may result in the wrong part being shipped.

A second type of error is transposed digits. It commonly occurs that the intended number 53411 gets typed in as number 54311 instead, just because two keys are pressed in reverse order. Transposition errors are also difficult for humans to detect.

These errors are avoidable through the use of a check digit because each of these numbers—the correct one and the error—would have a different check digit number, as shown in the figure below.

Avoiding common data-entry errors through the use of a check digit.

If part number 53411 was modified to 534118 (including the check digit 8) and either of the two errors just described occurred, the mistake would be caught. If the second digit was miskeyed as a 4, the computer would not accept 544118 as a valid number, because the check digit for 54411 would be 5, not 8. Similarly, if the second and third digits were transposed, as in 543118, the computer would also reject the number because the check digit for 54311 would be 6, not 8.

The systems analyst chooses the weights and the modulus number, but once chosen, they must not change. Some examples of weighting methods and modulus numbers can be found in the figure illustration below.

Examples of weighting methods and modulus numbers.

VERIFYING CREDIT CARDS

When credit cards are entered into a Web site or computer program, the first check is the length of the number. Credit card companies designed their cards to include a different number of digits. For example, Visa cards are 16 digits long while American Express card numbers are 15 digits in length.

Another test is to match the credit card company and bank to verify that it is indeed a card issued by that company. The first four digits usually signify the type of card. The middle digits usually represent the bank and the customer. The last digit is a check digit.

In addition to these verification methods, credit card processing uses a check digit formula called the Luhn formula, created in the 1960s. Suppose we are given a number 7-7-7-8-8-8, where the first five numbers represent a bank account number and the last digit is a check digit. Let’s apply the Luhn formula to see if this is a valid number.

Double the second last digit, then double every other digit (i.e., skip a digit, double the next, skip a digit, double the next, etc.). For example, the number 7-7-7-8-8-8 becomes 14-7-14-8-16-8.
If doubling any digit results in a number that is larger than 10, reduce this two-digit number to a single digit by adding the numbers together. In our example, the 14 becomes 1 4 5 and the 16 becomes 1 6 7. In doing so, our original number, 7-7-7-8-8-8 has been transformed into a new number, 5-7-5-8-7-8.
Now add all the digits in the new number together. So, 5 7 5 8 7 8 40.
Look at the total. If it ends in zero, the number is valid according to the Luhn formula. Since 40 ends in zero, we can say that it passes the Luhn formula test.

The Luhn formula can be used to identify mistakes in entering an incorrect credit card. For example, the credit card number 1334-1334-1334-1334 is assumed to be valid because the digits of the transformed number 2364-2364-2364-2364 will add up to 60, a number ending in zero. If a user incorrectly enters a wrong digit, the total would not be a multiple of zero.

The Luhn formula does not catch every error, however. If a user makes mistakes in entering more than one digit, for example entering 1334-1334-1334-3314, the total of the transformed number, 2364-2364-2364-6324, is still 60. This transposition error (flipping the second last and fourth last digit) will not be caught.

Credit card companies also use the expiration date and a three or four-digit verification code, often written on the reverse side of the card for more security.

The seven tests for checking on validity of input can go a long way toward protecting the system from the entry and storage of erroneous data. Always assume human errors in input are more likely than not to occur. It is your responsibility to understand which errors will invalidate data, and how to use the computer to guard against those human errors and thus limit their intrusion into system data.

Validating Input Transactions

Validating Input Data

Related: