XML - Extensible and Universal Data Format
This is a standard and universal data format. It allows to reuse a presentation for different data or use different presentations for same data.
XML, the eXtended Markup Language, is a successor for SGML like HTML but more generic, it incorporates data inside tags themselves and has unlimited
The format of the display is independant, and given by another document, the XSLT. Rules to create tags are defined by another document, the DTD (Document Type Declaration) which describes the grammar of the tags.
XML is processed by most programming languages through the Document Object Model, like HTML.
XML features and code sample
- Significant tags based upon the content of data.
- Meaning of tags depends upon the content the the tool which parses the XML document.
- Separated document used for the presentation.
An invoice in XML.
<?xml version="1.0" ?> <!- Invoice from Scriptol.com -> <invoice> <order>000156</order> <date timezone="Greenwhich"> Jan 1, 2003 14:30:00 </date> <address> <firstName>Sherlock</firstName> <lastName>Holmes</lastName> <street>5 Baker St.</street> <city>London</city> <state>England</state> <zip>75004</zip> </address> <amount> 270 </amount> </order> </invoice>
Names of tags are chosen for the readability of the document, their role depends entirely on tools that will access it.
Beyond textual document
XML is just a semantic language with a basic syntax, a tool "talking", ie converting works into actions. It is not just to contain text.
Start by looking at some applications of the SVG language. They are amazing, made of vector graphics that can even be animated. But SVG is nothing more than a subset of XML which is associated with an API. Tags become rectangles or various shapes and attributes the parameters that vary to obtain motion. SVG is a language understood by the browser (or other SVG rendering tool) to represent scaling images.
Another example is the RSS format. Once assigned a role to each tag, a list of links and descriptions becomes a press review.
In XHTML dialect every tag to a layout role. It is a subset of XML semantically equivalent to HTML that tells to browser how to present multimedia contents.
XML or JSON
We could express Web pages also in JSON files, this would reduce the file size, but probably would has slowed the development of the Web as HTML code is much more accessible to non-programmers.
For an application the choice of format is detailed in the article JSON or XML, which format to choose? But do we really need to choose? These are two ways to present the same structured content and converting a format to the other is not complicated. In fact, once the content loaded into memory and translated into objects and attributes, to serialize as XML or JSON file is just a matter of personal convenience.
The purpose of the article is mainly to decide when one or the other format is best suited for storing data, depending on the language or system that use it.
You can access an XML document in DOM or Sax way
Whether to access data or to change the document, or convert it into another format, several classes of tools are used.
There are two types of parsers. A tree parser loads the whole XML document entirely in memory, and you can then access the contents through the Document Object Model, specifically with instructions such as getElementsByTagName.
An event-driven parser on the contrary, according to the SAX API, loads the content progressively and all the data are stored, or only those that are asked.
A Java or C++ Parser for XML (Xerces) is distributed by the Apache group. Several other XML tools also.
C library using the DOM or SAX APIs.
Library to build a events parser (used by Scriptol and Xcheck).
XQuery is a language for XML database query, either from a file or a database with a tree structure similar to that of XML as Apache's XIndice. It allows to create an XML database and use it.
An XSL language is made of transformation rules. XSLT converts an XML document into another format such as HTML and can be used to access data too.
Transforms an XML document into HTML. There are Java and C++ versions.