OReilly+XML+Pocket+Reference+2nd.pdf

(420 KB) Pobierz
XML Pocket Reference, 2nd edition
XML Pocket Reference, 2nd Edition
Robert Eckstein & Michel Casabianca
Second Edition April 2001
ISBN: 0596001339
XML, the Extensible Markup Language, is the next-generation markup
language for the Web.
It provides a more structured (and therefore more powerful) medium
than HTML, allowing you to define new document types and
stylesheets as needed.
Although the generic tags of HTML are sufficient for everyday text, XML
gives you a way to add rich, well-defined markup to electronic documents.
The XML Pocket Reference is both a handy introduction to XML
terminology and syntax, and a quick reference to XML instructions,
attributes, entities, and datatypes.
Although XML itself is complex, its basic concepts are simple.
This small book combines a perfect tutorial for learning the basics of XML
with a reference to the XML and XSL specifications.
The new edition introduces information on XSLT (Extensible Stylesheet
Language Transformations) and Xpath.
Contents
1.1 Introduction
1
1.2 XML Terminology
2
1.3 XML Reference
9
1.4 Entity and Character References
15
1.5 Document Type Definitions
16
1.6 The Extensible Stylesheet Language
26
1.7 XSLT Stylesheet Structure
27
1.8 Templates and Patterns
28
1.9 XSLT Elements
33
1.10 XPath
50
1.11 XPointer and XLink
58
XML Pocket Reference, 2 nd edition
1.1 Introduction
The Extensible Markup Language (XML) is a document-processing standard that is an official
recommendation of the World Wide Web Consortium (W3C), the same group responsible for
overseeing the HTML standard. Many expect XML and its sibling technologies to become the markup
language of choice for dynamically generated content, including nonstatic web pages. Many companies
are already integrating XML support into their products.
XML is actually a simplified form of Standard Generalized Markup Language (SGML), an
international documentation standard that has existed since the 1980s. However, SGML is extremely
complex, especially for the Web. Much of the credit for XML's creation can be attributed to Jon Bosak
of Sun Microsystems, Inc., who started the W3C working group responsible for scaling down SGML to
a form more suitable for the Internet.
Put succinctly, XML is a meta language that allows you to create and format your own document
markups. With HTML, existing markup is static: <HEAD> and <BODY> , for example, are tightly
integrated into the HTML standard and cannot be changed or extended. XML, on the other hand,
allows you to create your own markup tags and configure each to your liking - for example,
<HeadingA> , <Sidebar> , <Quote> , or <ReallyWildFont> . Each of these elements can be defined
through your own document type definitions and stylesheets and applied to one or more XML
documents. XML schemas provide another way to define elements. Thus, it is important to realize that
there are no "correct" tags for an XML document, except those you define yourself.
While many XML applications currently support Cascading Style Sheets (CSS), a more extensible
stylesheet specification exists, called the Extensible Stylesheet Language (XSL). With XSL, you ensure
that XML documents are formatted the same way no matter which application or platform they appear
on.
XSL consists of two parts: XSLT ( transformations ) and XSL-FO ( formatting objects ).
Transformations, as discussed in this book, allow you to work with XSLT and convert XML documents
to other formats such as HTML. Formatting objects are described briefly in Section 1.6.1 .
This book offers a quick overview of XML, as well as some sample applications that allow you to get
started in coding. We won't cover everything about XML. Some XML-related specifications are still in
flux as this book goes to print. However, after reading this book, we hope that the components that
make up XML will seem a little less foreign.
page 1
XML Pocket Reference, 2 nd edition
1.2 XML Terminology
Before we move further, we need to standardize some terminology. An XML document consists of one
or more elements . An element is marked with the following form:
<Body>
This is text formatted according to the Body element
</Body>.
This element consists of two tags : an opening tag, which places the name of the element between a
less-than sign ( < ) and a greater-than sign ( > ), and a closing tag, which is identical except for the
forward slash ( / ) that appears before the element name. Like HTML, the text between the opening and
closing tags is considered part of the element and is processed according to the element's rules.
Elements can have attributes applied, such as the following:
<Price currency="Euro">25.43</Price>
Here, the attribute is specified inside of the opening tag and is called currency . It is given a value of
Euro , which is placed inside quotation marks. Attributes are often used to further refine or modify the
default meaning of an element.
In addition to the standard elements, XML also supports empty elements . An empty element has no
text between the opening and closing tags. Hence, both tags can (optionally) be combined by placing a
forward slash before the closing marker. For example, these elements are identical:
<Picture src="blueball.gif"></Picture>
Empty elements are often used to add nontextual content to a document or provide additional
information to the application that parses the XML. Note that while the closing slash may not be used
in single-tag HTML elements, it is mandatory for single-tag XML empty elements.
1.2.1 Unlearning Bad Habits
Whereas HTML browsers often ignore simple errors in documents, XML applications are not nearly as
forgiving. For the HTML reader, there are a few bad habits from which we should dissuade you:
XML is case-sensitive
Element names must be used exactly as they are defined. For example, <Paragraph> and
<paragraph> are not the same.
A non-empty element must have an opening and a closing tag
Each element that specifies an opening tag must have a closing tag that matches it. If it does
not, and it is not an empty element, the XML parser generates an error. In other words, you
cannot do the following:
<Paragraph>
This is a paragraph.
<Paragraph>
This is another paragraph.
Instead, you must have an opening and a closing tag for each paragraph element:
<Paragraph>This is another paragraph.</Paragraph>
page 2
<Picture src="blueball.gif"/>
<Paragraph>This is a paragraph.</Paragraph>
Zgłoś jeśli naruszono regulamin