Saturday, March 31, 2012

XML


XML stands for eXtensible Markup Language and it was designed to transport and store data. XML does not replace HTML. XML does not DO anything but it structures data, stores and transports data. The code below just stores the information of this note. But it does not do anything with it.


01
<note>
02
<to>Daniel</to>
03
<from>Mark</from>
04
<heading>Reminder</heading>
05
<body>Don't forget me!</body>
06
</note>

History of XML

Experienced SGML and WWW users realized that technologies available could pose some serious problems. SGML was added to the W3C (World Wide Web Consortium) in 1995 and work began on a new standard.

XML was built by a working group of 11 members and an interest group of 150 members. The project had 3 editors and XML was designed using emails and teleconferences. 20 weeks later, the first working draft was released. W3C recommended XML 1 in early 1998. XML 1.0 achieved the following list of goals:

  • internet usability
  • SGML compatibility
  • general purpose stability
  • formality
  • conciseness
  • legibility
  • ease of authoring
  • minimization of optional features


The fourth edition of XML was released on the 16th of August, in 2006.
[http://www.totalxml.net]


Why XML?

Several reasons exist for using XML. Among these we find the following:

Self describing data
Business applications have several tasks apart from presenting the content. XML is ideal here because it offers complete data usability and also proper presentation of data. XML is preferred over traditional database systems.

Integration of traditional databases and formats
XML documents supports all types of data such as text and numbers, multimedia, sound and video and also active formats such as Java Applets and Active X components.

Data presentation modifications
XML Style Sheets can be used in a similar way as CSS. The can modify documents and websites while keeping the same data.

One Server View
An XML document can contain data from multiple servers and different databases. This means that the whole WWW is converted into one database.

Internationalization
XML supports documents with more than one language and Unicode standards are also supported. This is of utmost importance in electronic business applications

Open and extensible
The XML structure can be modified to match specific vocabulary and users can opt to add elements if they need them.

Used to create new Internet Languages
Several Internet Languages were created with XML. These include: XHTML, WSDL, WAP, WML, RSS, and more.

Syntax of XML

The first line of an XML document is the XML Declaration which defines the XML version and encoding used. Next up is the declaration of the root element. There can be only one root element in the XML document. This is known as the Tree Structure of XML and can be seen in the example below:


01
<inventory>
02
<drink>
03
<lemonade>
04
<price>$2.50</price>
05
<amount>20</amount>
06
</lemonade>
07
<pop>
08
<price>$1.50</price>
09
<amount>10</amount>
10
</pop>
11
</drink>
12
<snack>
13
<chips>
14
<price>$4.50</price>
15
<amount>60</amount>
16
</chips>
17
</snack>
18
</inventory>

Figure 1: The tree structure represented in the XML code above

Rules in XML are very simple and logical. All elements must have a closing tag. Unlike HTML, XML does not compile if the closing tag is missing. XML Tags are also case sensitive. This means that <name> is not the same as <Name>. This makes XML faster. XML Elements must be properly nested. An element which is opened within another element must be closed within that element.  The XML document must have a root (or parent) element. XML elements can have attributes and the attribute values must be quoted. This is shown in the code snippet below:

<person birthdate="12/11/1998">

The < and the & are illegal characters in XML. This makes comparisons different from other languages. Although the > character is perfectly legal, it is still recommended to use the predefined entry references in XML. This can be seen in the table below:

&lt;
< 
less than
&gt;
> 
greater than
&amp;
&
ampersand
&apos;
'
apostrophe
&quot;
"
quotation mark
Table 1: Entry References in XML

Comments in XML have a slightly different syntax and these are shown as below:

<!-- This is a comment -->

Criticism of XML

XML, together with its extensions have been regularly criticized for complexity and verbosity. XML can be difficult to use when mapping the basic tree model of XML to type systems of programming languages or databases. This is more difficult when used to exchange data between applications which is highly structured since this was not the primary design goal. Alternatives to XML are JSON and YAML as they both focus on representing structured data instead of narrative documents.

XPath

XPath is a syntax for defining parts of an XML document. It uses expressions to navigate in XML documents. It also contains a library of standard functions and it is a major element in XSLT. It includes more than a hundred built-in functions.

In XPath, there are seven types of nodes. These are element, attribute, text, namespace, processing-instruction, comments and document nodes. XML documents are considered to be trees of nodes. The root element is at topmost element of the tree. XPath is used to query the XML document to obtain the needed the information. It can be used to search for items with particular name, and also by making comparisons. Wild cards can also be used.

XSL and XSLT

XSL is made up of three parts: XSLT, XPath and XSL-FO. XSLT stands for eXtensible Stylesheet Language Transformations and is a language for transforming XML document. XSL-FO is a language for formatting XML documents.

XSLT is used to transform XML document into another XML document, or to another type of document such as HTML or XHTML. XSLT uses XPath to find information in an XML document and then is transforms the matching parts into the result document.

A small example will incorporate XML, XSL and XPath.

CD Collection Example

First we start by having a collection of CDs information inside an XML document. This document will hold the title, the artist, the country, the company, the price and the year of issue. We also add the reference to the style sheet after the declaration of XML version. A sample of the document can be seen below:


01
<?xml version="1.0" encoding="ISO-8859-1"?>
02
<?xml-stylesheet type="text/xsl" href="s.xsl"?>
03
<catalog>
04
     <cd>
05
           <title>Empire Burlesque</title>
06
           <artist>Bob Dylan</artist>
07
           <country>USA</country>
08
           <company>Columbia</company>
09
           <price>10.90</price>
10
           <year>1985</year>
11
     </cd>
12
     <cd>
13
           <title>Hide your heart</title>
14
           <artist>Bonnie Tyler</artist>
15
           <country>UK</country>
16
           <company>CBS Records</company>
17
           <price>9.90</price>
18
           <year>1988</year>
19
</cd>
20
</catalog>

Then we create the style sheet in another XSL document. In this style sheet we create a table with 6 columns to display all the information of each CD. Using a for-each loop and XPath to find each instance of CD, we put the value of the title and of the artist in each column of the new row.


01
<?xml version="1.0" encoding="ISO-8859-1"?>
02
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
03
<xsl:template match="/">
04
  <html>
05
  <body>
06
  <h2>My CD Collection</h2>
07
    <table border="1">
08
      <tr bgcolor="#9acd32">
09
        <th>Title</th>
10
        <th>Artist</th>
11
      </tr>
12
      <xsl:for-each select="catalog/cd">
13
      <tr>
14
        <td><xsl:value-of select="title"/></td>
15
        <td><xsl:value-of select="artist"/></td>
16
      </tr>
17
      </xsl:for-each>
18
    </table>
19
  </body>
20
  </html>
21
</xsl:template>
22
</xsl:stylesheet>

When the XML document is opened in a browser the style is very different and it is much easier to read the required information. The final result can be seen below:


Figure 2: The final result of the XML document with XSL

No comments:

Post a Comment