XML stands for eXtensible Markup
Language and it was designed to transport and store data. XML does not replace
HTML. XML does not DO anything but it structures data, stores and transports
data. The code below just stores the information of this note. But it does not
do anything with it.
01
|
<note>
|
02
|
<to>Daniel</to>
|
03
|
<from>Mark</from>
|
04
|
<heading>Reminder</heading>
|
05
|
<body>Don't forget me!</body>
|
06
|
</note>
|
History of XML
Experienced SGML and WWW users
realized that technologies available could pose some serious problems. SGML was
added to the W3C (World Wide Web Consortium) in 1995 and work began on a new
standard.
XML was built by a working group
of 11 members and an interest group of 150 members. The project had 3 editors
and XML was designed using emails and teleconferences. 20 weeks later, the
first working draft was released. W3C recommended XML 1 in early 1998. XML 1.0
achieved the following list of goals:
- internet usability
- SGML compatibility
- general purpose stability
- formality
- conciseness
- legibility
- ease of authoring
- minimization of optional features
The fourth edition of XML was
released on the 16th of August, in 2006.
[http://www.totalxml.net]
Why XML?
Several reasons exist for using
XML. Among these we find the following:
Self describing data
Business applications have
several tasks apart from presenting the content. XML is ideal here because it
offers complete data usability and also proper presentation of data. XML is
preferred over traditional database systems.
Integration of traditional
databases and formats
XML documents supports all types
of data such as text and numbers, multimedia, sound and video and also active
formats such as Java Applets and Active X components.
Data presentation modifications
XML Style Sheets can be used in a
similar way as CSS. The can modify documents and websites while keeping the
same data.
One Server View
An XML document can contain data
from multiple servers and different databases. This means that the whole WWW is
converted into one database.
Internationalization
XML supports documents with more
than one language and Unicode standards are also supported. This is of utmost
importance in electronic business applications
Open and extensible
The XML structure can be modified
to match specific vocabulary and users can opt to add elements if they need
them.
Used to create new Internet Languages
Several Internet Languages were
created with XML. These include: XHTML, WSDL, WAP, WML, RSS, and more.
Syntax of XML
The first line of an XML document
is the XML Declaration which defines the XML version and encoding used. Next up
is the declaration of the root element. There can be only one root element in
the XML document. This is known as the Tree Structure of XML and can be seen in
the example below:
01
|
<inventory>
|
02
|
<drink>
|
03
|
<lemonade>
|
04
|
<price>$2.50</price>
|
05
|
<amount>20</amount>
|
06
|
</lemonade>
|
07
|
<pop>
|
08
|
<price>$1.50</price>
|
09
|
<amount>10</amount>
|
10
|
</pop>
|
11
|
</drink>
|
12
|
<snack>
|
13
|
<chips>
|
14
|
<price>$4.50</price>
|
15
|
<amount>60</amount>
|
16
|
</chips>
|
17
|
</snack>
|
18
|
</inventory>
|
Figure 1: The tree
structure represented in the XML code above
Rules in XML are very simple and
logical. All elements must have a closing tag. Unlike HTML, XML does not
compile if the closing tag is missing. XML Tags are also case sensitive. This
means that <name> is not the same as <Name>. This makes XML faster.
XML Elements must be properly nested. An element which is opened within another
element must be closed within that element.
The XML document must have a root (or parent) element. XML elements can
have attributes and the attribute values must be quoted. This is shown in the
code snippet below:
<person birthdate="12/11/1998">
The < and the & are illegal
characters in XML. This makes comparisons different from other languages.
Although the > character is perfectly legal, it is still recommended to use
the predefined entry references in XML. This can be seen in the table below:
<
|
<
|
less
than
|
>
|
>
|
greater
than
|
&
|
&
|
ampersand
|
'
|
'
|
apostrophe
|
"
|
"
|
quotation
mark
|
Table 1:
Entry References in XML
Comments in XML have a slightly
different syntax and these are shown as below:
<!--
This is a comment -->
Criticism of XML
XML, together with its extensions
have been regularly criticized for complexity and verbosity. XML can be
difficult to use when mapping the basic tree model of XML to type systems of
programming languages or databases. This is more difficult when used to
exchange data between applications which is highly structured since this was
not the primary design goal. Alternatives to XML are JSON and YAML as they both
focus on representing structured data instead of narrative documents.
XPath
XPath is a syntax for defining
parts of an XML document. It uses expressions to navigate in XML documents. It
also contains a library of standard functions and it is a major element in
XSLT. It includes more than a hundred built-in functions.
In XPath, there are seven types
of nodes. These are element, attribute, text, namespace,
processing-instruction, comments and document nodes. XML documents are
considered to be trees of nodes. The root element is at topmost element of the
tree. XPath is used to query the XML document to obtain the needed the
information. It can be used to search for items with particular name, and also
by making comparisons. Wild cards can also be used.
XSL and XSLT
XSL is made up of three parts:
XSLT, XPath and XSL-FO. XSLT stands for eXtensible Stylesheet Language
Transformations and is a language for transforming XML document. XSL-FO is a
language for formatting XML documents.
XSLT is used to transform XML
document into another XML document, or to another type of document such as HTML
or XHTML. XSLT uses XPath to find information in an XML document and then is
transforms the matching parts into the result document.
A small example will incorporate
XML, XSL and XPath.
CD Collection Example
First we start by having a
collection of CDs information inside an XML document. This document will hold
the title, the artist, the country, the company, the price and the year of
issue. We also add the reference to the style sheet after the declaration of
XML version. A sample of the document can be seen below:
01
|
<?xml version="1.0"
encoding="ISO-8859-1"?>
|
02
|
<?xml-stylesheet
type="text/xsl" href="s.xsl"?>
|
03
|
<catalog>
|
04
|
<cd>
|
05
|
<title>Empire
Burlesque</title>
|
06
|
<artist>Bob
Dylan</artist>
|
07
|
<country>USA</country>
|
08
|
<company>Columbia</company>
|
09
|
<price>10.90</price>
|
10
|
<year>1985</year>
|
11
|
</cd>
|
12
|
<cd>
|
13
|
<title>Hide your
heart</title>
|
14
|
<artist>Bonnie
Tyler</artist>
|
15
|
<country>UK</country>
|
16
|
<company>CBS
Records</company>
|
17
|
<price>9.90</price>
|
18
|
<year>1988</year>
|
19
|
</cd>
|
20
|
</catalog>
|
Then we create the style sheet in
another XSL document. In this style sheet we create a table with 6 columns to
display all the information of each CD. Using a for-each loop and XPath to find
each instance of CD, we put the value of the title and of the artist in each
column of the new row.
01
|
<?xml
version="1.0" encoding="ISO-8859-1"?>
|
02
|
<xsl:stylesheet
version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
|
03
|
<xsl:template
match="/">
|
04
|
<html>
|
05
|
<body>
|
06
|
<h2>My CD Collection</h2>
|
07
|
<table border="1">
|
08
|
<tr bgcolor="#9acd32">
|
09
|
<th>Title</th>
|
10
|
<th>Artist</th>
|
11
|
</tr>
|
12
|
<xsl:for-each
select="catalog/cd">
|
13
|
<tr>
|
14
|
<td><xsl:value-of
select="title"/></td>
|
15
|
<td><xsl:value-of
select="artist"/></td>
|
16
|
</tr>
|
17
|
</xsl:for-each>
|
18
|
</table>
|
19
|
</body>
|
20
|
</html>
|
21
|
</xsl:template>
|
22
|
</xsl:stylesheet>
|
When the XML document is opened in a browser the style is
very different and it is much easier to read the required information. The
final result can be seen below:
Figure 2: The final
result of the XML document with XSL