The following guidelines should be taken into account when designing master files or database relations:
- Each separate data entity should create a master database table. Do not combine two distinct entities on one file. For example, items are purchased from vendors. The ITEM MASTER table should contain only item information, and the VENDOR MASTER table should contain only vendor information.
- A specific data field should exist only on one master table. For example, the CUSTOMER NAME should exist only on the CUSTOMER MASTER table, not on the ORDER table or any other master table. The exceptions to this guideline are the key or index fields, which may be on as many tables as necessary. If a report or screen needs information from many tables, the indexes should provide the linkage for obtaining the required records.
- Each master table or database relation should have programs to Create, Read, Update, and Delete (abbreviated CRUD) the records. Ideally, only one program should add new records and only one program should delete specified records. Many programs, however, may be responsible for changing data fields in the course of normal business activities. For example, a CUSTOMER MASTER file may have a CURRENT BALANCE field that is increased by the ORDER TOTAL in the order processing program and decreased by a PAYMENT AMOUNT or an AMOUNT RETURNED from two additional programs.
Integrity Constraints
Integrity constraints are rules that govern changing and deleting records, and that help keep the data in the database accurate. Three types of integrity constraints apply to a database:
- Entity integrity.
- Referential integrity.
- Domain integrity.
Entity integrity constraints are rules that govern the composition of primary keys. The primary key cannot have a null value, and if the primary key is a composite key, none of the component fields in the key can contain a null value. Some databases allow you to define a unique constraint or a unique key. This unique key identifies only one record, which is not a primary key. The difference between a unique key and a primary key is that a unique key may contain a null value.
Referential integrity governs the nature of records in a one-to-many relationship. The table that is connected to the one end of the relationship is called the parent. The table connected to the many end of the relationship is called the child table. Referential integrity means that all foreign keys in the many table (the child table) must have a matching record in the parent table. Hence, you cannot add a record in the child (many) table without a matching record in the parent table.
A second implication is that you cannot change a primary key that has matching child table records. If you could change the parent record, the result would be a child record that would have a different parent record or an orphan record, or a child record without a parent record. Examples are a GRADE record for a student that would not be on the STUDENT MASTER table and an ORDER record for a CUSTOMER NUMBER that did not exist. The last implication of referential integrity is that you cannot delete a parent record that has child records. That would also lead to the orphan records mentioned earlier.
Referential integrity is implemented in two different ways. One way is to have a restricted database, in which the system can update or delete a parent record only if there are no matching child records. A cascaded database will delete or update all child records when a parent record is deleted or changed (the parent triggers the changes).
A restricted relationship is better when deleting records. You would not want to delete a customer record and have all the outstanding invoices deleted as well! The cascaded approach is better when changing records. If the primary key of a student record is changed, all the course records for that student would have their foreign keys (the STUDENT NUMBER on the COURSE MASTER) changed as well.
Domain integrity rules are used to validate the data, such as table, limit, range, and other validation checks. They are further explained in Chapter “Designing Accurate Data Entry Procedures”. The domain integrity rules are usually stored in the database structure in one of two forms. Check constraints are defined at the table level and can refer to one or more fields in the table. An example is that the DATE OF PURCHASE is always less than or equal to the current date. Rules are defined at the database level as separate objects and can be used with a number of fields. An example is a value that is greater than zero, used to validate a number of elements.
Anomalies
Four anomalies may occur when creating database tables:
- Data redundancy.
- Insert anomaly.
- Deletion anomaly.
- Update anomaly.
Data redundancy occurs when the same data are stored in more than one place in the database (except for primary keys stored as foreign keys). This problem is solved by creating tables that are in 3NF.
An insert anomaly occurs when the entire primary key is not known and the database cannot insert a new record, which would violate entity integrity. This problem usually occurs when the primary key is a composite key containing several smaller attributes. An insert anomaly may be avoided by using a sequence number for the primary key.
A deletion anomaly happens when a record is deleted, resulting in the loss of other related data. An example is an item that has a vendor number and a particular item is the only reference to a certain vendor. If that item is deleted, there would be no reference to the vendor record.
An update anomaly results when a change to one attribute value either causes the database to contain inconsistent data or causes multiple records to need changing. An example is when a street name changes in a city. You might change some of the street names and not others, or you will have to ensure that all street names have been changed. This can occur when you have transitive dependencies and may be prevented by creating tables that are in 3NF (although in the street example, the data may be in 3NF).