Each data element should be defined once in the data dictionary and may also be entered previously on an element description form, such as the one illustrated in the figure below.
Characteristics commonly included on the element description form are the following:
- Element ID. This optional entry allows the analyst to build automated data dictionary entries.
- The name of the element. The name should be descriptive, unique, and based on what the element is commonly called in most programs or by the major user of the element.
- Aliases, which are synonyms or other names for the element. Aliases are names used by different users in different systems. For example, a CUSTOMER NUMBER may also be called a RECEIVABLE ACCOUNT NUMBER or a CLIENT NUMBER.
- A short description of the element.
- Whether the element is base or derived. A base element is one that is initially keyed into the system, such as a customer name, address, or city. Base elements must be stored in files. Derived elements are created by processes as the result of a calculation or a series of decision-making statements.
- The length of an element. Some elements have standard lengths. In the United States, for example, lengths for state name abbreviations, zip codes, and telephone numbers are all standard. For other elements, the lengths may vary, and the analyst and user community must jointly decide the final length based on the following considerations:If the element is too small, the data that need to be entered will be truncated. The analyst must decide how that will affect the system outputs. For example, if a customer’s last name is truncated, mail would usually still be delivered; if an email address is truncated, however, it will be returned as not found.
- Numeric amount lengths should be determined by figuring the largest number the amount will probably contain and then allowing reasonable room for expansion. Lengths designated for totals should be large enough to accommodate the sum of the numbers accumulated in them.
- Name and address fields may be given lengths based on the following table. For example, a last name field of 11 characters will accommodate 98 percent of the last names in the United States.
- For other fields, it is often useful to examine or sample historical data found in the organization to determine a suitable field length.
Field Length Percentage of Data That Will Fit (U.S.) Last Name 11 98 First Name 18 95 Company Name 20 95 Street 18 90 City 17 99
- The type of data—numeric, date, alphabetic, varchar, or character, which is sometimes called alphanumeric or text data. Varchar data may contain any number of characters, up to a limit set by the database software. When using varchar, specifying the length is optional. Several of these formats are shown in the table illustrated below. Character fields may contain a mixture of letters, numbers, and special characters. If the element is a date, its format—for example, MMDDYYYY—must be determined. If the element is numeric, its storage type should be determined.
Personal computer formats, such as currency, number, or scientific, depend on how the data will be used. Number formats are further defined as integer, long integer, single precision, double precision, and so on. There are many other types of formats used with PC systems. Unicode is a standardized coding system for defining graphic symbols, such as Chinese or Japanese characters. Unicode is described in greater detail in a later chapter. There are three standard formats for mainframe computers: zoned decimal, packed decimal, and binary. The zoned decimal format is used for printing and displaying data. The packed decimal format is commonly used to save space on file layouts and for elements that require a high level of arithmetic to be performed on them. The binary format is suitable for the same purposes as the packed decimal format but is less commonly used.Data Type Meaning Bit A value of 1 or 0, a true/false value Char, varchar, text Any alphanumeric character Datetime, smalldatetime Alphanumeric data, several formats Decimal, numeric Numeric data that are accurate to the least significant digit; can contain a whole and decimal portion Float, real Floating-point values that contain an approximate decimal value Int, smallint, tinyint Only integer (whole digit) data Currency, money, smallmoney Monetary numbers accurate to four decimal places Binary, varbinary, image Binary strings (sound, pictures, video) Cursor, timestamp, uniqueidentifier A value that is always unique within a database Autonumber A number that is always incremented by one when a record is added to a database table - Input and output formats should be included, using special coding symbols to indicate how the data should be presented. These symbols and their uses are illustrated in the table below. Each symbol represents one character or digit. If the same character repeats several times, the character followed by a number in parentheses indicating how many times the character repeats is substituted for the group. For example, XXXXXXXX would be represented as X(8).
Formatting Character Meaning X May enter or display/print any character 9 Enter or display only numbers Z Display leading zeros as spaces , Insert commas into a numeric display · Insert a period into a numeric display / Insert slashes into a numeric display - Insert a hyphen into a numeric display V Indicate a decimal position (when the decimal point is not included) - Validation criteria for ensuring that accurate data are captured by the system. Elements are either discrete, meaning they have certain fixed values, or continuous, with a smooth range of values. Here are common editing criteria:
- A range of values is suitable for elements that contain continuous data. For example, in the United States a student grade point average may be from 0.00 through 4.00. If there is only an upper or lower bound to the data, a limit is used instead of a range.
- A list of values is indicated if the data are discrete. Examples are codes representing the colors of items for sale in World’s Trend’s catalog.
- A table of codes is suitable if the list of values is extensive (for example, state abbreviations, telephone country codes, or U.S. telephone area codes.)
- For key or index elements, a check digit is often included.
- Any default value the element may have. The default value is displayed on entry screens and is used to reduce the amount of keying that the operator may have to do. Usually, several fields in each system have default values. When using GUI lists or drop-down lists, the default value is the one currently selected and highlighted. When using radio buttons, the option for the default value is selected, and when using check boxes, the default value (either “yes” or “no”) determines whether or not the check box will have an initial check in it.
- An additional comment or remarks area. This might be used to indicate the format of the date, special validation that is required, the check digit method used (explained in Chapter 15), and so on.
Data element descriptions such as CUSTOMER NUMBER may be called CLIENT NUMBER elsewhere in the system (perhaps old code written with this alias needs to be updated). Another kind of data element is an alphabetic element. At World’s Trend Catalog Division, codes are used to describe colors: for example, BL for blue, WH for white, and GR for green. When this element is implemented, a table will be needed for users to look up the meanings of these codes. (Coding is discussed further in Chapter 15.)