Creating Documents  «Prev  Next»

Lesson 4Discerning Inherent Structure
ObjectiveDetermine the Inherent Structure of Information within XML Documents

Determine Inherent Structure of Information in XML Documents

The first step in creating any XML document should be to determine the inherent structure of the information within the document.
Structure depends on individual preferences. You can structure a simple document in many ways.
Examine the following frames to consider the structure of sample documents. You will examine a business letter and a product catalog:

1) XML Structure 1 2) XML Structure 2 3) XML Structure 3 4) XML Structure 4 5) XML Structure 5 6) XML Structure 6 7) XML Structure 7 8) XML Structure 8 9) XML Structure 9
  1. Examine this basic business letter. What do you see in terms of structure?
  2. One very basic way of structuring the data in the business letter.
  3. An XML structure for the document could be written like this (tags are in bold print to make them easy to see)
  4. Same letter structured into more specific pieces of information. In this example, a lot more specifiy has been added.
  5. XML code for the version of the business letter in the previous frame. Remember that XML is case-specific.
  6. While the above is in fact a well-formed document strictly in a syntactical sense, it will mean nothing from a human point of view.
  7. One of the most appropriate examples of XML usage is describing elements in a product list or catalog.
  8. The catalog data represented in XML
  9. Documents that are well-formed create natural tree-like structures that stem from the root.

Structure sample documents
The next lesson shows you how to create a well-formed document from text.

Data-Centric Versus Document-Centric

The examples you have seen concentrated on what are known as data-centric uses of XML. This is where raw data is combined with markup to help give it meaning, make it easier to use, and enable greater interoperability.
There is a second major use of XML and markup in general, which is known as document-centric. This is where more loosely structured content is annotated with metadata. HTML is usually considered to be a document-centric use of SGML (and XHTML, is similarly a document-oriented application of XML) because HTML is generally content that is designed to be read by humans rather than data that will be consumed by a piece of software. XML is designed to be read and understood by both humans and software but, as you will see later, the ways of processing the different styles of XML can vary considerably.
Document-centric XML is generally used to facilitate multiple publishing channels and provide ways of reusing content. This is useful for instances in which regular content changes need to be applied to multiple forms of media at once. A few years ago I worked on a system that produced training materials for the online sector. A database held a large number of articles, quizzes, and revision aids that could be collated into general training materials. These were all in an XML format very similar to XHTML, the XML version of HTML. Once the content was finalized in this database, it was transformed using XSLT into media suitable for both the Web and a traditional printed output. When using document-centric XML in this sort of system, whenever content changes, it is only necessary to alter the underlying data for changes to be propagated to all forms of media in use. Additionally, when a different form of the content is needed, to support mobile web browsers for example, a new transformation is the only necessary action.

Discerning XML - Quiz

Click the Quiz link below to check your understanding of rules for XML documents.
Discerning XML - Quiz