1 / 85

The XML Standard

The XML Standard. Overview of our XML Standards. Motivation: HTML vs XML XML 101: syntax, elements, attributes, DTDs, … XML 201: XML Schema, Namespaces XSLT: Transforming and Rendering XML XQuery: Search, Transform & Integrate. So what is XML (all about)?. Executive Summary:

ppiper
Download Presentation

The XML Standard

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The XML Standard

  2. Overview of our XML Standards • Motivation: HTML vs XML • XML 101: syntax, elements, attributes, DTDs, … • XML 201: XML Schema, Namespaces • XSLT: Transforming and Rendering XML • XQuery: Search, Transform & Integrate

  3. So what is XML (all about)? Executive Summary: • XML = HTML – idiosyncrasies (simplified syntax) + user-definable ("semantic") tags • Separation of data and its presentation => simple, very flexible data exchange format: semistructured data model => new applications: • Information exchange (B2B), sharing (diglib), integration ("mediation"), archival, ... • Web site mangement (XML+XSL stylesheets), ...

  4. What’s Wrong with HTML? Y. Papakonstantinou, S. Abiteboul, H. Garcia-Molina. “Object Fusion in Mediator Systems”. In VLDB 96. HTML confuses presentation with content <DT> <IMG SRC="greenball.gif" >&nbsp; <A NAME="object-fusion"></A> Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps"> "ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I> </DT>

  5. ...What’s Wrong with HTML... No Explicit Structure, Semantics, or Object-Orientation <DT> <IMG SRC= "greenball.gif" >&nbsp; <A NAME="object-fusion"></A> Y.Papakonstantinou, S. Abiteboul, H. Garcia-Molina. <A HREF="http://www-cse.ucsd.edu/~yannis/papers/fusion.ps"> "ObjectFusion in Mediator Systems".</A> In <I>VLDB 96.</I> </DT> Author Title Conference

  6. ... And Some Repercussions • Lack of schema/semantics when querying the Web (HTML): • "find documents (books, papers, ...) where author = Michael Jackson" (... and learn how software engineering meets the moon walker ...) • "create a list of M. Jackson's books and (if available) their prices" => HTML is inappropriate for • data exchange • automation of information management (retrieval, manipulation, integration)

  7. XML is Based on Markup Markup indicates structure and semantics <bibliography> <paper ID= "object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> Decoupled from presentation

  8. Elements and their Content Element Content element name <bibliography> <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> element Empty Element Character content

  9. Element Attributes Attribute name <bibliography> <paper ID="object-fusion"> <authors> <author>Y.Papakonstantinou</author> <author>S. Abiteboul</author> <author>H. Garcia-Molina</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <booktitle>VLDB 96</booktitle> </paper> </bibliography> Attribute Value

  10. bibliography ... paper paper authors fullpaper title Object Fusion author author ... Yannis Serge XML = Labeled Ordered Trees • can also represent • relational and • object-orienteddata @id 23 <bibliography> <paper id=23...> <authors> <author>Yannis</author> <author>Serge</author> ... </authors> <title>Object Fusion</title> ... </paper> </bibliography>  semistructured data  labeled trees/graphs

  11. In Search of the Lost Structure & Semantics How do I share structure and metadata/semantics with my community? How do I learn and use the element structure of a document? How to make all this automatable?

  12. Adding Structure and Semantics • XML Document Type Definitions (DTDs): • define the structure of "allowed" documents (i.e., valid wrt. a DTD) •  database schema => improve query formulation, execution, ... • XML Schema • defines structure and data types • XML Namespaces • identify your vocabulary • Resource Description Framework (RDF) • simple metadata model

  13. bibliography paper* paperauthors fullPaper? title booktitle authorsauthor+ XML DTDs as Extended CFGs XML DTD <!element bibliographypaper*> <!element paper(authors,fullPaper?,title,booktitle)> <!element authorsauthor+> Grammar lhs = element (name) rhs = regular expression over elements + strings (PCDATA)

  14. Document Type Definitions (DTDs) Define and Constrain Element Names & Structure <!element bibliography paper*> <!element paper (authors, fullPaper?, title, booktitle)> <!element authors author+> <!element author (#PCDATA)> <!element fullPaper EMPTY> <!element title (#PCDATA)> <!element booktitle (#PCDATA)> <!attlist fullPaper source ENTITY #REQUIRED> <!attlist paper ID ID> Element Type Declaration Attribute List Declaration

  15. Element Declarations Authors followed by optional fullpaper, followed by title, followed by booktitle Sequence of 0 or more paper <!element bibliography paper*> <!element paper (authors, fullPaper?, title, booktitle)> <!element authors author+> <!element author (#PCDATA)> <!element fullPaper EMPTY> <!element title (#PCDATA)> <!element booktitle (#PCDATA)> <!attlist fullPaper source ENTITY #REQUIRED> <!attlist paper ID ID> Sequence of 1 or more author Character content

  16. Element Content Declarations

  17. Attributes <person ID="yannis"> Yannis’ info </person> <bibliography> <paper ID="object-fusion" ROLE="publication"> <authors> <author authorRef="yannis"> Y.Papakonstantinou</author> </authors> <fullPaper source="fusion"/> <title>Object Fusion in Mediator Systems</title> <related papers= "semistructured-data" "mediators"/> </paper> </bibliography> Object Identity Attribute CDATA (character data) IDREF intradocument reference Reference to external ENTITY

  18. Attribute Types

  19. Uses of XML Entities • Physical partition • size, reuse, "modularity", … (both XML docs & DTDs) • Non-XML data • unparsed entities  binary data • Non-standard characters • character entities • Shorthand for phrases & markup

  20. Types of Entities • Internal (to a doc) vs. External ( use URI) • General (in XML doc) vs. Parameter (in DTD) • Parsed (XML) vs. Unparsed (non-XML)

  21. Internal Text Entities Internal Text Entity Declaration <!ENTITY WWW "World Wide Web"> Entity Reference <p>We all use the &WWW;.</p> Logically equivalent to actually appearing <p>We all use the World Wide Web.</p>

  22. Unparsed (& "Binary") Entities ... and unparsed entity Declare external... <!ENTITY fusion SYSTEM "fusion.ps" NDATA ps> Declare attribute type to be entity <!attlist fullPaper source ENTITY #REQUIRED> Element with ENTITY attribute <fullPaper source="fusion"/> NOTATION declaration (helper app) <!NOTATION ps SYSTEM "ghostview.exe">

  23. From Docs to Data: XML Schema • XML DTDs (part of the XML spec.) • flexible, semistructured data model (nesting, ANY, ?, *, |, ...) • but document-oriented (SGML heritage) • XML Schema (W3C working draft) • schema definition language in XML • data-oriented: data types • extends capabilities of DTD

  24. Sample Data for Introduction to XML Schema <?xml version="1.0" encoding="utf-8"?>  <book isbn="0836217462"> <title>Being a Dog Is a Full-Time Job</title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>

  25. The Simple “Russian Doll” Approach to XML Schema Complex Type Content for book <?xml version="1.0" encoding="utf-8"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema">  <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="character“ minOccurs="0" maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="friend-of" type="xsd:string“ minOccurs="0" maxOccurs="unbounded"/> <xsd:element name="since" type="xsd:date"/> <xsd:element name="qualification" type="xsd:string"/> </xsd:sequence> … <xsd:attribute name="isbn" type="xsd:string"/> Optional Namespace Definition Sequence Compositor Simple Type Content for title and author Character may appear any number of times Basic Type of XML Schema

  26. The Catalog Approach to XML Schema: Stand-Alone Declarations & References <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> <xsd:element name="name" type="xsd:string"/> … <xsd:attribute name="isbn" type="xsd:string"/> <xsd:element name="character"> <xsd:complexType> <xsd:sequence> <xsd:element ref="name"/> <xsd:element ref="friend-of” minOccurs="0" maxOccurs="unbounded"/> <xsd:element ref="since"/> <xsd:element ref="qualification"/> </xsd:sequence> </xsd:complexType> </xsd:element> Simple Type Elements Attributes Complex Type Element character Reference

  27. Catalog Approach Cont’d <xsd:element name="book"> <xsd:complexType> <xsd:sequence> <xsd:element ref="title"/> <xsd:element ref="author"/> <xsd:element ref="character“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> <xsd:attribute ref="isbn"/> </xsd:complexType> </xsd:element>

  28. Named Types nameType derived from xsd:string by having the xsd:maxLengthfacet restrict string to a Maximum of to 32 characters • Write stand-alone named complex type or simple type declarations • Primitive form of inheritance (called derivation) allows • Restriction • Extension <xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="32"/> </xsd:restriction> </xsd:simpleType> <xsd:complexType name="characterType"> <xsd:sequence> <xsd:element name="name“ type="nameType"/> <xsd:element name="friend-of“ type="nameType” minOccurs="0“ maxOccurs="unbounded"/> <xsd:element name="since" type="sinceType"/> <xsd:element name="qualification" type="descType"/> </xsd:sequence> </xsd:complexType> nameType used in the declaration of characterType

  29. Groups: Named containers of sets of Elements or Attributes <xsd:group name="mainBookElements"> <xsd:sequence> <xsd:element name="title" type="nameType"/> <xsd:element name="author" type="nameType"/> </xsd:sequence> </xsd:group> <xsd:complexType name="bookType"> <xsd:sequence> <xsd:group ref="mainBookElements"/> <xsd:element name="character" type="characterType“ minOccurs="0" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType>

  30. Compositors: Sequence, Choice, All So far we have seen sequences The groupnameTypesconsists of one of • the element “name” • the sequence containing firstName, middlename, lastName <xsd:group name="nameTypes"> <xsd:choice> <xsd:element name="name" type="xsd:string"/> <xsd:sequence> <xsd:element name="firstName" type="xsd:string"/> <xsd:element name="middleName" type="xsd:string“ minOccurs="0"/> <xsd:element name="lastName" type="xsd:string"/> </xsd:sequence> </xsd:choice> </xsd:group>

  31. Compositors (cont’d) The characterTypeconsists of name, a list of friend-of, since, and qualification particles in no particular order. (Compare with the sequence compositor.) <xsd:complexType name="characterType"> <xsd:all> <xsd:element name="name“ type="nameType"/> <xsd:element name="friend-of“ type="nameType” minOccurs="0“ maxOccurs="unbounded"/> <xsd:element name="since" type="sinceType"/> <xsd:element name="qualification" type="descType"/> </xsd:all> </xsd:complexType>

  32. Derivation of Simple Types:Unions and Lists So far we have seen restrictions and facets <xsd:simpleType name="isbnType"> <xsd:union> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:pattern value="[0-9]{10}"/> </xsd:restriction> </xsd:simpleType> <xsd:simpleType> <xsd:restriction base="xsd:NMTOKEN"> <xsd:enumeration value="TBD"/> <xsd:enumeration value="NA"/> </xsd:restriction> </xsd:simpleType> </xsd:union> </xsd:simpleType> The simple typeisbnTypewill be either • a 10-digit string (notice the pattern) • the token "TBD“ or the token "NA"

  33. Constraints: Uniqueness By inserting xsd:unique in the book element declaration we enforce that the character name’sin each book are unique <xsd:element name="book"> … <xsd:unique name="charNameMustBeUnique"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:unique> … </xsd:element>

  34. Namespaces <xsd:schema xmlns:xsd=http://www.w3.org/2000/10/XMLSchema xmlns=http://example.org/ns/books/ targetNamespace=http://example.org/ns/books/ elementFormDefault="qualified“ attributeFormDefault="unqualified" >

  35. Including Unknown Elements <xsd:complexType name="descType" mixed="true"> <xsd:sequence> <xsd:any namespace=http://www.w3.org/1999/xhtml minOccurs="0" maxOccurs="unbounded“ processContents="skip"/> </xsd:sequence> </xsd:complexType>

  36. Presenting XML: XSLT • Why Stylesheets? • separation of content (XML) from presentation (XSL) • Why not just CSS for XML? • XSL is far more powerful: • selecting elements • transforming the XML tree • content based display (result may depend on data)

  37. XSLT Overview • XSLT stylesheets are denoted in XML syntax • XSL components: 1. a language for transforming XML documents (XSLT: integral part of the XSL specification) 2. an XML formatting vocabulary (Formatting Objects: >90% of the formatting properties inherited from CSS)

  38. Transformation XSL stylesheet XML source tree XML,HTML,… result tree XSLT Processing Model

  39. XSLT Processing Model • XSL stylesheet: collection of template rules • template rule: (patterntemplate) • main steps: • match pattern against source tree • instantiate template (replace current node “.” by the template in the result tree) • select further nodes for processing • control can be • program-driven ("pull": <xsl:foreach> ...) • data/event-driven ("push": <xsl:apply-templates> ...)

  40. pattern Template Rule: Example template <xsl:template match="product"> <table> <xsl:apply-templates select="sales/domestic"/> </table> <table> <xsl:apply-templates select="sales/foreign"/> </table> </xsl:template> (i) match pattern: process <product> elements (ii) instantiate template: replace each a product with two HTML tables (iii) select the <product> grandchildren (“sales/domestic”, “sales/foreign”) for further processing

  41. Match/Select Patterns • match patterns  select patterns = defined in http://w3.org/TR/xpath • Examples: • /mybook/chapter[2]/section/* • chapter|appendix • chapter//para • div[@class="appendix" and position() mod 2 = 1]//para • ../@lang

  42. Creating the Result Tree... • Literal result elements: non-XSL elements (e.g., HTML) appear “literally” in the result tree • Constructing elements: (similar for xsl:attribute, xsl:text, xsl:comment,…) • Generating text: <xsl:element name = "…"> attribute & children definition </xsl:element> <xsl:template match="person"> <p> <xsl:value-of select="@first-name"/> <xsl:text> </xsl:text> <xsl:value-of select="@surname"/> </p> </xsl:template>

  43. Example of Turning XML into HTML <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="FitnessCenter.xsl"?> <FitnessCenter> <Member level="platinum"> <Name>Jeff</Name> <Phone type="home">555-1234</Phone> <Phone type="work">555-4321</Phone> <FavoriteColor>lightgrey</FavoriteColor> </Member> </FitnessCenter>

  44. HTML Document in an XSL Template <?xml version="1.0"?> <xsl:output method="html"/> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome! </BODY> </HTML> </xsl:template> </xsl:stylesheet>

  45. Extracting the Member Name <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template> </xsl:stylesheet>

  46. Extracting a Value from an XML Document,Navigating the XML Document • Extracting values: • use the <xsl:value-of select="…"/> XSL element • Navigating: • The slash ("/") indicates parent/child relationship • A slash at the beginning of the path indicates that it is an absolute path, starting from the top of the XML document /FitnessCenter/Member/Name "Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element."

  47. Document / PI <?xml version=“1.0”?> Element FitnessCenter Element Member Element Phone Element Name Element FavoriteColor Element Phone Text 555-4321 Text Jeff Text lightgrey Text 555-1234

  48. Extract the FavoriteColor and use it as the bgcolor <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! </BODY> </HTML> </xsl:template> </xsl:stylesheet> (see html-example03)

  49. Note Attribute values cannot contain "<" nor ">" - Consequently, the following is NOT valid: <Body bgcolor="<xsl:value-of select='/FitnessCenter/Member/FavoriteColor'/>"> To extract the value of an XML element and use it as an attribute value you must use curly braces: <Body bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Evaluate the expression within the curly braces. Assign the value to the attribute.

  50. Extract the Home Phone Number <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <HTML> <HEAD> <TITLE>Welcome</TITLE> </HEAD> <BODY bgcolor="{/FitnessCenter/Member/FavoriteColor}"> Welcome <xsl:value-of select="/FitnessCenter/Member/Name"/>! <BR/> Your home phone number is: <xsl:value-of select="/FitnessCenter/Member/Phone[@type='home']"/> </BODY> </HTML> </xsl:template> </xsl:stylesheet>

More Related