Often the element structure of XML content is defined using a document type definition (DTD). A DTD is used to determine whether the XML document content is valid, that is, whether it conforms to the order and type of data that must be present in the document. The DTD is easy to create and well supported by standard software. Once the DTD has been completed, it may be used to validate the XML document using standard XML tools. The DTD is easier to create if a data dictionary has been completed, since the analyst has worked with users and made decisions on the structure of the data.
Figure below illustrates the document type definition for the Customer XML document. Keywords, such as !DOCTYPE, indicating the start of the DTD, must be in capital letters. !ELEMENT describes an element, and !ATTLIST describes an attribute, listing the element name followed by the attribute name. An element that has the keyword #PCDATA, for parsed character data, is a primitive element, not further defined. An element that has a series of other elements within parentheses means that they are child elements and must be in the order listed. The statement ) means that the name must have the last name followed by the first name followed by the middle initial. The question mark after “middle_initial” means that the element is optional and may be left out of the document for a particular customer. A plus sign means that there are one or more repeatable elements. Customers must contain at least one customer tag but could contain many customer tags. An asterisk means that there is zero or more of the elements. Each customer may have zero to many orders. A vertical bar separates two or more child elements that are mutually exclusive. Payment contains either check or credit card as options.
The attribute list definition for a customer number contains a keyword ID (in uppercase letters). This means that the attribute number must appear only once in the XML document as an attribute for an element with an ID. That it is somewhat similar to a primary key. The difference is that, if the document had several different elements, each with an ID attribute, the given ID (C15008 in this example) could appear only once. An ID must start with a letter or an underscore and cannot be solely a number. The reason behind putting the customer number as an ID is to ensure that it is not repeated in a longer document. The keyword #REQUIRED means that the attribute must be present. A keyword of #IMPLIED means that the attribute is optional. A document may also have an IDREF attribute, which links one element with another that is an ID. The ORDER tag has a customer_number attribute defined as an IDREF, and the value C15008 must be present in an ID somewhere in the document. An attribute list containing values in parentheses means that the attribute must contain one of the values. A DTD definition <!ATTLIST credit_card type (M|V|A|D|O) #REQUIRED> means that the credit card type must be either an M, V, A, D, or O.