![[ to the WWW7 website ]](www7_logo.gif) |
WWW7: a comprehensive introduction to XML |
by james tauber presented at QUT
(Queensland University of Technology)
status of the various specifications related to XML:
- XML recommendation
- XML namespaces working draft
- XLL working drafts (XLink and XPointer)
- XSL note
limitations of HTML:
- extensibility (no own tags)
- structure
- validation (correct syntax not enforced)
benefits of separation of structure and presentation:
- multiple delivery formats
- information reuse
- multiple sources of information
- consistent presentation
- reduce time spent fiddling with presentation
- applications can make use of documents
how XML came about as a solution:
"some people" realized that a limited set of tags can't be the
solution. what we need must be:
- a simple but extensible solution
- a subset of SGML
- simple to implement
- designed for the web
- powerful linking
- style sheets with scripting
- familiar to HTML authors
XML design goals:
- straightforwardly usable over the Internet
- support a wide variety of applications
- compatible with SGML
- easy to write programs which process XML documents
- the number of optional features in XML is to be kept to the absolute
minimum, ideally zero
- XML documents should be human legible and reasonably clear
- the XML design should be prepared quickly
- the design of XML is formal and concise
- XML documents must be easy to create
- terseness in XML markup is of minimal importance
the next version of PERL will have XML support built in.
the anatomy of an XML document:
- made up of entities
- parsed data (XML text) or unparsed data (not XML: text or binary,
has a notion with it)
entity structure:
- storage units (files or data from database)
- may contain parsed and non-parsed data
- piecing together a document with multiple parts
- reuse components (footer etc)
- all entities but document entity have names
types of markup
- start-tags: <NAME ATTR="value">
(quotes or double quotes required)
- end-tags: </NAME>
- empty elements: <NAME></NAME>
or <NAME/>
- entity references: &NAME;
(allows to "include" text or text fragments)
- character references: &#decimal-number;
or &#xhexadecimal-number;
- comments: <!-- any text but not two hyphens
-->
- CDATA: <![CDATA[this is not markup]]>
- processing instructions: <?target instructions?>,
whereas target might be an application such as SQL
XML declaration:
document prolog: <?xml version="1.0"
encoding="UTF-8"?>
document type declarations:
<!DOCTYPE NAME SYSTEM "name.dtd">
("external subset")
can also be included into the document ("internal subset"),
which has preceedence over the external declarations.
XML documents have to be well-formed (syntactically
correct e.g. not intermixed tags) and to be valid (has
to have a document type definition and must comply to that).
document type definitions:
- markup declarations
- element declarations are implemented through the so called "content
model" which defines entities made out of (recoursive) content
particles. particles can be a sequence (number of attributes that must
appear in this order) or choice of (zero or many, one or many, zero or
one) particles.
- attribute lists declarations: <!ATTLIST NAME
declarations> (values can be required or optional,
default values can be provided)
- entity declarations: <!ENTITY NAME value>
(there are a number of pre-definied entities such as < etc)
- notation declarations: <!NOTATION NAME SYSTEM "specifications">
- processing instructions
- comments
- conditional sections, e.g. draft versus final version of the same
document
character entities:
- character repertoire: unicode (16 bit)
- character encoding: UTF-8 (variable length: 8 bit for ASCII, 16 bit
for 2-byte unicode, more bytes if required) or UTF-16
- character classes: characters, name characters, NMTOKEN (name
token), NAME (must start with a letter or an underscore)
- white spaces are one or more SPACE, CR, LF or TAB
- multiple end of lines will be normalized to one line break
- language identification: xml:lang="language code" (e.g.
ISO 639 language code)
XML processors:
- not all implementations require valid documents
- most are implemented in Java
- run either on the client or on the server
XML namespaces:
how can two definitions with the same name be unique ? define a
namespace based on a unique URI: <?xml:namespace
URI PREFIX ?>. use qualified names to refer to such
definitions (PREFIX:NAME)
XLL: the eXtensible Linking Language:
XLink: based on HTML, HyTime and TEI
in HTML:
- the link is expressed at one of its ends
- can only be travelled from that end
- the behaviour of a link is impossed by the browser
- a link goes only to one direction
types of links <A xml:link attributes>xxx</A>
- locators: URI (as in HTML)
- connectors: URI#XPointer (similar to HTML) or URI\XPointer
- queries: URI?XML-XPTR=XPointer
links attributes:
- inline vs outline
- multi-directional links
- links to multiple resources
the target document can either replace the current document, it can be
included in the current document or it can appear in a different window
(SHOW attribute). a link can be followed automatically or on user action
(ACTUATE attribute).
XSL: eXtensible Style sheet Language:
XSL is written in XML and is a subset of DSSSL. a stylesheet
associates formating objects with elements in an XML document to produce
formatted output.
the XSL scripting language is based on ECMAScript (which is the
standardized version of JavaScript). a number of built in functions
allow to do computations to achieve things such as automatic numbered
list generation etc.
to the main page of this
WWW7 trip report.
production note: these session notes were taken live
on a SHARP HC-4500A running Windows CE with Pocket Word. they were then
transfered to a notebook running Windows 95 and were slightly reformatted
using HoTMetaL V4.0. this document is supposed to be HTML V4.0 compliant.
tutorial_xml.html / 14-apr-1998 (ra) /
reto ambühler
!!! Dieses Dokument stammt aus dem
ETH Web-Archiv und wird nicht mehr gepflegt !!!
!!! This document is stored in the
ETH Web archive and is no longer maintained !!!