860 likes | 1.12k Views
The XML Standard. Overview of our XML Standards. Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, … XML 201: XML Schema, Namespaces XSLT: Transforming and Rendering XML XQuery: Search, Transform & Integrate . So what is XML (all about)? . Executive Summary:
E N D
Overview of our XML Standards • Motivation: HTML vs XML • XML 101: syntax, elements, attributes, DTDs, … • XML 201: XML Schema, Namespaces • XSLT: Transforming and Rendering XML • XQuery: Search, Transform & Integrate
So what is XML (all about)? Executive Summary: • XML = HTML – idiosyncrasies (simplified syntax) + user-definable ("semantic") tags • Separation of data and its presentation => simple, very flexible data exchange format: semistructured data model => new applications: • Information exchange (B2B), sharing (diglib), integration ("mediation"), archival, ... • Web site mangement (XML+XSL stylesheets), ...
What’s Wrong with HTML? Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina. “Object Fusion in Mediator Systems”. In VLDB 96. HTML confuses presentation with content <DT> <IMG SRC="greenball.gif" > <A NAME="object-fusion"></A> Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps"> "ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I> </DT>
...What’s Wrong with HTML... No Explicit Structure, Semantics, or Object-Orientation <DT> <IMG SRC= "greenball.gif" > <A NAME="object-fusion"></A> Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps"> "ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I> </DT> Author Title Conference
... And Some Repercussions • Lack of schema/semantics when querying the Web (HTML): • "find documents (books, papers, ...) where author = Michael Jackson" (... and learn how software engineering meets the moon walker ...) • "create a list of M. Jackson's books and (if available) their prices" => HTML is inappropriate for • data exchange • automation of information management (retrieval, manipulation, integration)
XML is Based on Markup Markup indicates structure and semantics <bibliography> <paper ID= "object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> Decoupled from presentation
Elements and their Content Element Content element name <bibliography> <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> element Empty Element Character content
Element Attributes Attribute name <bibliography> <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> Attribute Value
bibliography ... paper paper authors fullpaper title Object Fusion author author ... Yannis Serge XML = Labeled Ordered Trees • can also represent • relational and • object-orienteddata @id 23 <bibliography> <paper id=23...> <authors> <author>Yannis</author> <author>Serge</author> ... </authors> <title>Object Fusion</title> ... </paper> </bibliography> semistructured data labeled trees/graphs
In Search of the Lost Structure & Semantics How do I share structure and metadata/semantics with my community? How do I learn and use the element structure of a document? How to make all this automatable?
Adding Structure and Semantics • XML Document Type Definitions (DTDs): • define the structure of "allowed" documents (i.e., valid wrt. a DTD) • database schema => improve query formulation, execution, ... • XML Schema • defines structure and data types • XML Namespaces • identify your vocabulary • Resource Description Framework (RDF) • simple metadata model
bibliography paper* paperauthors fullPaper? title booktitle authorsauthor+ XML DTDs as Extended CFGs XML DTD <!element bibliographypaper*> <!element paper(authors,fullPaper?,title,booktitle)> <!element authorsauthor+> Grammar lhs = element (name) rhs = regular expression over elements + strings (PCDATA)
Document Type Definitions (DTDs) Define and Constrain Element Names & Structure <!element bibliography paper*> <!element paper (authors, fullPaper?, title, booktitle)> <!element authors author+> <!element author (#PCDATA)> <!element fullPaper EMPTY> <!element title (#PCDATA)> <!element booktitle (#PCDATA)> <!attlist fullPaper source ENTITY #REQUIRED> <!attlist paper ID ID> Element Type Declaration Attribute List Declaration
Element Declarations Authors followed by optional fullpaper, followed by title, followed by booktitle Sequence of 0 or more paper <!element bibliography paper*> <!element paper (authors, fullPaper?, title, booktitle)> <!element authors author+> <!element author (#PCDATA)> <!element fullPaper EMPTY> <!element title (#PCDATA)> <!element booktitle (#PCDATA)> <!attlist fullPaper source ENTITY #REQUIRED> <!attlist paper ID ID> Sequence of 1 or more author Character content
Attributes <person ID="yannis"> Yannis’ info </person> <bibliography> <paper ID="object-fusion" ROLE="publication"> <authors> <author authorRef="yannis"> Y.Papakonstantinou</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <related papers= "semistructured-data" "mediators"/> </paper> </bibliography> Object Identity Attribute CDATA (character data) IDREF intradocument reference Reference to external ENTITY
Uses of XML Entities • Physical partition • size, reuse, "modularity", … (both XML docs & DTDs) • Non-XML data • unparsed entities binary data • Non-standard characters • character entities • Shorthand for phrases & markup
Types of Entities • Internal (to a doc) vs. External ( use URI) • General (in XML doc) vs. Parameter (in DTD) • Parsed (XML) vs. Unparsed (non-XML)
Internal Text Entities Internal Text Entity Declaration <!ENTITY WWW "World Wide Web"> Entity Reference <p>We all use the &WWW;.</p> Logically equivalent to actually appearing <p>We all use the World Wide Web.</p>
Unparsed (& "Binary") Entities ... and unparsed entity Declare external... <!ENTITY fusion SYSTEM "fusion.ps" NDATA ps> Declare attribute type to be entity <!attlist fullPaper source ENTITY #REQUIRED> Element with ENTITY attribute <fullPaper source="fusion"/> NOTATION declaration (helper app) <!NOTATION ps SYSTEM "ghostview.exe">
From Docs to Data: XML Schema • XML DTDs (part of the XML spec.) • flexible, semistructured data model (nesting, ANY, ?, *, |, ...) • but document-oriented (SGML heritage) • XML Schema (W3C working draft) • schema definition language in XML • data-oriented: data types • extends capabilities of DTD
Sample Data for Introduction to XML Schema <?xml version="1.0" encoding="utf-8"?> <book isbn="0836217462"> <title>Being a Dog Is a Full-Time Job</title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>
The Simple “Russian Doll” Approach to XML Schema Complex Type Content for book <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"> <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="character“ minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="friend-of" type="xsd:string“ minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="since" type="xsd:date"/> <xsd:element name="qualification" type="xsd:string"/> </xsd:sequence> … <xsd:attribute name="isbn" type="xsd:string"/> Optional Namespace Definition Sequence Compositor Simple Type Content for title and author Character may appear any number of times Basic Type of XML Schema
The Catalog Approach to XML Schema: Stand-Alone Declarations & References <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="name" type="xsd:string"/> … <xsd:attribute name="isbn" type="xsd:string"/> <xsd:element name="character"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="friend-of” minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="since"/> <xsd:element ref="qualification"/> </xsd:sequence> </xsd:complexType> </xsd:element> Simple Type Elements Attributes Complex Type Element character Reference
Catalog Approach Cont’d <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="author"/> <xsd:element ref="character“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute ref="isbn"/> </xsd:complexType> </xsd:element>
Named Types nameType derived from xsd:string by having the xsd:maxLengthfacet restrict string to a Maximum of to 32 characters • Write stand-alone named complex type or simple type declarations • Primitive form of inheritance (called derivation) allows • Restriction • Extension <xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="32"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="characterType"> <xsd:sequence> <xsd:element name="name“ type="nameType"/> <xsd:element name="friend-of“ type="nameType” minOccurs="0“ maxOccurs="unbounded"/> <xsd:element name="since" type="sinceType"/> <xsd:element name="qualification" type="descType"/> </xsd:sequence> </xsd:complexType> nameType used in the declaration of characterType
Groups: Named containers of sets of Elements or Attributes <xsd:group name="mainBookElements"> <xsd:sequence> <xsd:element name="title" type="nameType"/> <xsd:element name="author" type="nameType"/> </xsd:sequence> </xsd:group> <xsd:complexType name="bookType"> <xsd:sequence> <xsd:group ref="mainBookElements"/> <xsd:element name="character" type="characterType“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>
Compositors: Sequence, Choice, All So far we have seen sequences The groupnameTypesconsists of one of • the element “name” • the sequence containing firstName, middlename, lastName <xsd:group name="nameTypes"> <xsd:choice> <xsd:element name="name" type="xsd:string"/> <xsd:sequence> <xsd:element name="firstName" type="xsd:string"/> <xsd:element name="middleName" type="xsd:string“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:string"/> </xsd:sequence> </xsd:choice> </xsd:group>
Compositors (cont’d) The characterTypeconsists of name, a list of friend-of, since, and qualification particles in no particular order. (Compare with the sequence compositor.) <xsd:complexType name="characterType"> <xsd:all> <xsd:element name="name“ type="nameType"/> <xsd:element name="friend-of“ type="nameType” minOccurs="0“ maxOccurs="unbounded"/> <xsd:element name="since" type="sinceType"/> <xsd:element name="qualification" type="descType"/> </xsd:all> </xsd:complexType>
Derivation of Simple Types:Unions and Lists So far we have seen restrictions and facets <xsd:simpleType name="isbnType"> <xsd:union> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{10}"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="xsd:NMTOKEN"> <xsd:enumeration value="TBD"/> <xsd:enumeration value="NA"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> The simple typeisbnTypewill be either • a 10-digit string (notice the pattern) • the token "TBD“ or the token "NA"
Constraints: Uniqueness By inserting xsd:unique in the book element declaration we enforce that the character name’sin each book are unique <xsd:element name="book"> … <xsd:unique name="charNameMustBeUnique"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:unique> … </xsd:element>
Namespaces <xsd:schema xmlns:xsd=http://www.w3.org/2000/10/XMLSchema xmlns=http://example.org/ns/books/ targetNamespace=http://example.org/ns/books/ elementFormDefault="qualified“ attributeFormDefault="unqualified" >
Including Unknown Elements <xsd:complexType name="descType" mixed="true"> <xsd:sequence> <xsd:any namespace=http://www.w3.org/1999/xhtml minOccurs="0" maxOccurs="unbounded“ processContents="skip"/> </xsd:sequence> </xsd:complexType>
Presenting XML: XSLT • Why Stylesheets? • separation of content (XML) from presentation (XSL) • Why not just CSS for XML? • XSL is far more powerful: • selecting elements • transforming the XML tree • content based display (result may depend on data)
XSLT Overview • XSLT stylesheets are denoted in XML syntax • XSL components: 1. a language for transforming XML documents (XSLT: integral part of the XSL specification) 2. an XML formatting vocabulary (Formatting Objects: >90% of the formatting properties inherited from CSS)
Transformation XSL stylesheet XML source tree XML,HTML,… result tree XSLT Processing Model
XSLT Processing Model • XSL stylesheet: collection of template rules • template rule: (patterntemplate) • main steps: • match pattern against source tree • instantiate template (replace current node “.” by the template in the result tree) • select further nodes for processing • control can be • program-driven ("pull": <xsl:foreach> ...) • data/event-driven ("push": <xsl:apply-templates> ...)
pattern Template Rule: Example template <xsl:template match="product"> <table> <xsl:apply-templates select="sales/domestic"/> </table> <table> <xsl:apply-templates select="sales/foreign"/> </table> </xsl:template> (i) match pattern: process <product> elements (ii) instantiate template: replace each a product with two HTML tables (iii) select the <product> grandchildren (“sales/domestic”, “sales/foreign”) for further processing
Match/Select Patterns • match patterns select patterns = defined in http://w3.org/TR/xpath • Examples: • /mybook/chapter[2]/section/* • chapter|appendix • chapter//para • div[@class="appendix" and position() mod 2 = 1]//para • ../@lang
Creating the Result Tree... • Literal result elements: non-XSL elements (e.g., HTML) appear “literally” in the result tree • Constructing elements: (similar for xsl:attribute, xsl:text, xsl:comment,…) • Generating text: <xsl:element name = "…"> attribute & children definition </xsl:element> <xsl:template match="person"> <p> <xsl:value-of select="@first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="@surname"/> </p> </xsl:template>
Example of Turning XML into HTML <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="FitnessCenter.xsl"?> <FitnessCenter> <Member level="platinum"> <Name>Jeff</Name> <Phone type="home">555-1234</Phone> <Phone type="work">555-4321</Phone> <FavoriteColor>lightgrey</FavoriteColor> </Member> </FitnessCenter>
HTML Document in an XSL Template <?xml version="1.0"?> <xsl:output method="html"/> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome! </BODY> </HTML> </xsl:template> </xsl:stylesheet>
Extracting the Member Name <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template> </xsl:stylesheet>
Extracting a Value from an XML Document,Navigating the XML Document • Extracting values: • use the <xsl:value-of select="…"/> XSL element • Navigating: • The slash ("/") indicates parent/child relationship • A slash at the beginning of the path indicates that it is an absolute path, starting from the top of the XML document /FitnessCenter/Member/Name "Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element."
Document / PI <?xml version=“1.0”?> Element FitnessCenter Element Member Element Phone Element Name Element FavoriteColor Element Phone Text 555-4321 Text Jeff Text lightgrey Text 555-1234
Extract the FavoriteColor and use it as the bgcolor <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template> </xsl:stylesheet> (see html-example03)
Note Attribute values cannot contain "<" nor ">" - Consequently, the following is NOT valid: <Body bgcolor="<xsl:value-of select='/FitnessCenter/Member/FavoriteColor'/>"> To extract the value of an XML element and use it as an attribute value you must use curly braces: <Body bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Evaluate the expression within the curly braces. Assign the value to the attribute.
Extract the Home Phone Number <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! <BR/> Your home phone number is: <xsl:value-of select="/FitnessCenter/Member/Phone[@type='home']"/> </BODY> </HTML> </xsl:template> </xsl:stylesheet>