1 / 95

Universal Database Systems

Universal Database Systems. Part 4: Databases and XML. Overview. Introduction to XML DTDs and Schemas for XML Documents Languages for XML, in particular XSL Querying and Storing XML Summary and Outlook Running Example: e-shopper‘s_heaven.com. Some Motivation.

lowrya
Download Presentation

Universal Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Universal Database Systems Part 4: Databases and XML

  2. Overview • Introduction to XML • DTDs and Schemas for XML Documents • Languages for XML, in particular XSL • Querying and Storing XML • Summary and Outlook Running Example:e-shopper‘s_heaven.com UDBS Part 4 - Winter 2001/2

  3. Some Motivation • Data exchange between two partners • Data integration from different sources • Working with the Web • E-Commerce Scenario UDBS Part 4 - Winter 2001/2

  4. Destination ProtocolTransformationFormatsData Data Exchange Source UDBS Part 4 - Winter 2001/2

  5. Source 1 Source 2 Source 3 ProtocolTransformationFormatsData Destination Data Integration UDBS Part 4 - Winter 2001/2

  6. Working with the Web Server 1 Server 2 Server 3 HTTPSearch EngineHTML UDBS Part 4 - Winter 2001/2

  7. E-Commerce Scenario Customer Payment Supplier EC Portal UDBS Part 4 - Winter 2001/2

  8. Problems • Heterogeneous data formats • Varying data quality(missing values, varying level of detail) • Missing distinction between contents and formatting ("markup") • Derivation of individual data collections difficult UDBS Part 4 - Winter 2001/2

  9. The Web Today and Tomorrow • HTML documents, all meant for human (not machine) consumption • More and more documents are automatically generated by computers or applications • Applications must be able to communicate directly • Companies need interoperability at an increasing pace • Data exchange must work across platform and company boundaries UDBS Part 4 - Winter 2001/2

  10. Running Example:e-shopper‘s_heaven.com • Internet-based store for books, movies, and music • Merchandise comes in the mail • Publishers, music producers, etc. are supposed to move their data directly into the heaven database • Web presentationsare generated from that database for a variety of target platforms • Users can browse/search the database UDBS Part 4 - Winter 2001/2

  11. State of the Art: HTML • HyperText Markup Language(T. Berners-Lee 1990) • Basis for most Web pages • Document properties come from markups or tags, e.g., • point size, character set • text structure (title, paragraphs, etc.) • hyperlinks UDBS Part 4 - Winter 2001/2

  12. But ... • HTML does not separate markup from contents • Misuse of tags (e.g., <h1> for “bold face“ instead of “header 1“) • Web users want voice, E-Commerce, WAP, EDI; HTML is difficult to extend • No modularity • Weak internationalization UDBS Part 4 - Winter 2001/2

  13. Overview • Introduction to XML • DTDs and Schemas for XML Documents • Languages for XML, in particular XSL • Querying and Storing XML • Summary and Outlook UDBS Part 4 - Winter 2001/2

  14. In Detail • Fundamental language elements, mostly by way of examples:tags, elements, attributes • Well-formedness • Tree structure vs. serialization • IDs and referencing UDBS Part 4 - Winter 2001/2

  15. Extensible MarkupLanguage (XML) • a W3C "Recommendation"(W3C: MIT, INRIA, Keio) • Meta language for the creation and formatting of document markups and for the specification of (other) languages UDBS Part 4 - Winter 2001/2

  16. What Does XML Do? • Documents are hierarchically decomposed into parts ("elements") • The parts are named • Names and contents are (Unicode) text • Rules can describe how parts fit together UDBS Part 4 - Winter 2001/2

  17. Analogy to Relational Databases UDBS Part 4 - Winter 2001/2

  18. text element tag Example <book> <author> Serge Abiteboul </author> <author> Rick Hull </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year> </book> UDBS Part 4 - Winter 2001/2

  19. XML Elements • Elementsarecorresponding pairs of begin and end tags with text included • Tags determine structure,text determines contents • Elements may beempty,e.g., <red></red> abbrev. <red/> • Elements may be nested to form a tree, i.e., there is a single root element, and all tag pairs observe a strict nesting (no overlaps!) UDBS Part 4 - Winter 2001/2

  20. XML Documents • An XML document is an unranked, ordered tree and consists of elements • The ordering of elements in a document is significant; an XML document is ordered • Well-formed document: loosely speaking “matching tags“(one closing per opening tag at the same level of nesting) UDBS Part 4 - Winter 2001/2

  21. Non-equivalent Documents <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> <book> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <title> Foundations… </title> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> The corresponding trees are different! UDBS Part 4 - Winter 2001/2

  22. e-shopper‘s Example (1) <CATALOG> <BOOKCATALOG> <BOOK category="technical" language="en"> <ISBN>0070310866</ISBN> <AUTHOR> <PERSON> <FIRSTNAME>Abraham</FIRSTNAME> <LASTNAME>Silberschatz</LASTNAME> </PERSON> <PERSON> <LASTNAME>Korth</LASTNAME> <FIRSTNAME>Henry F.</FIRSTNAME> </PERSON> </AUTHOR> <TITLE>Database System Concepts</TITLE> <PUBLISHER>McGraw Hill</PUBLISHER> <LOCATION/> <EDITION>3</EDITION> <YEAR>1998</YEAR> </BOOK> UDBS Part 4 - Winter 2001/2

  23. e-shopper‘s Example (2) <BOOK category="fiction" language="de"> <ISBN>342333052X</ISBN> <AUTHOR> <PERSON> <LASTNAME>Singh</LASTNAME> <FIRSTNAME>Simon</FIRSTNAME> </PERSON> </AUTHOR> <TITLE>Fermat‘s Last Theorem</TITLE> <PUBLISHER>DTV</PUBLISHER> <LOCATION>Munich</LOCATION> <EDITION>1</EDITION> <YEAR>2000</YEAR> </BOOK></BOOKCATALOG> XML document:a tree of elements containing character data UDBS Part 4 - Winter 2001/2

  24. book book ISBN ISBN author title year author . . . . . edition title person publisher . . . . . Tree View bookcatalog UDBS Part 4 - Winter 2001/2

  25. Alternative View: Serialization <BOOK category="technical“, language="en"><ISBN>0070310866</ISBN><AUTHOR><PERSON><FIRSTNAME>Abraham</FIRSTNAME><LASTNAME>Silberschatz</LASTNAME></PERSON><PERSON><LASTNAME>Korth</LASTNAME><FIRSTNAME>Henry F.</FIRSTNAME></PERSON></AUTHOR><TITLE>Database System Concepts</TITLE><PUBLISHER>McGrawHill</PUBLISHER><LOCATION/><EDITION>3</EDITION><YEAR>1998</YEAR></BOOK> UDBS Part 4 - Winter 2001/2

  26. Attributes • Elements can have attributes (with name and value) • Attribute ordering is immaterial • Attributes are data modeling alternatives, i.e., information can be represented as elements or via attributes: <address> <street>N. Olive St.</street><city>Dallas</city> </address> vs. <address street="N. Olive St." city="Dallas"/> • Attribute values must appear in single or double quotes (' or ")and are user-defined • Each attribute occurs in a tag at most once UDBS Part 4 - Winter 2001/2

  27. e-shopper‘s Example (3) <MOVIECATALOG> <VIDEO language="de"> <TITLE>The Sixth Sense</TITLE> <DIRECTOR> <PERSON> <LASTNAME>Shyamalan</LASTNAME> <FIRSTNAME>M. Night</FIRSTNAME> </PERSON> </DIRECTOR> <CAST> <ACTOR> <PERSON> <LASTNAME>Willis</LASTNAME> <FIRSTNAME>Bruce</FIRSTNAME> </PERSON> </ACTOR> <ACTOR> <PERSON> <LASTNAME>Osment</LASTNAME> <FIRSTNAME>Haley Joel</FIRSTNAME> </PERSON> </ACTOR> </CAST> <RUNTIME>103</RUNTIME> <YEAR>2000</YEAR> </VIDEO> UDBS Part 4 - Winter 2001/2

  28. e-shopper‘s Example (4) <DVD RegionCode="2"> <TITLE>Matrix</TITLE> <DIRECTOR> <PERSON> <LASTNAME>Wachowski</LASTNAME> <FIRSTNAME>Andy</FIRSTNAME> </PERSON> <PERSON> <LASTNAME>Wachowski</LASTNAME> <FIRSTNAME>Larry</FIRSTNAME> </PERSON> </DIRECTOR> <CAST> <ACTOR> <PERSON> <LASTNAME>Reeves</LASTNAME> <FIRSTNAME>Keanu</FIRSTNAME> </PERSON> </ACTOR> <ACTOR> <PERSON> <LASTNAME>Fishburne</LASTNAME> <FIRSTNAME>Laurence</FIRSTNAME> </PERSON> </ACTOR> <ACTRESS> <PERSON> <LASTNAME>Moss</LASTNAME> <FIRSTNAME>Carrie-Anne</FIRSTNAME> </PERSON> </ACTRESS> </CAST> <RUNTIME>131</RUNTIME> <YEAR>1999</YEAR> <SOUND> <LANGUAGE>de</LANGUAGE> <SOUNDMIX>AC3/Dolby Digital 5.1</SOUNDMIX> </SOUND> <ANNOTATION>includes: Making-of, Comments of Director, Actor Biographies</ANNOTATION> </DVD> </MOVIECATALOG> UDBS Part 4 - Winter 2001/2

  29. video dvd sound title title runtime director cast director cast year actor actress runtime . . . . . . . . . . Tree View (cont‘d) moviecatalog UDBS Part 4 - Winter 2001/2

  30. bookcatalog moviecatalog musiccatalog book video musicitem ISBN title title author director performer cast year person year title runtime . . . . . track . . . . . track Overall Tree View catalog UDBS Part 4 - Winter 2001/2

  31. Attribute Types • String type • CDATA: character data, any Unicode character • Tokenized type • ID: unique element identifier • IDREF: the value of a unique ID attribute • IDREFS: multiple IDREFs of an element • ENTITY/ENTITIES: the name of an entity or a list of entity names • NMTOKEN/NMTOKENS: a (list of) name token(s) • Enumerated type • NOTATION: the name of a notation that allows for a specific interpretation of the value • ENUMERATION: a list of possible values UDBS Part 4 - Winter 2001/2

  32. IDs and IDREFs • AnId(entifier) attribute with a unique value can be associated with an element • This element can then be referenced from somewhere else using an Idrefattribute • Note:Both IDs and references are just syntax in XML! UDBS Part 4 - Winter 2001/2

  33. Example <personid="o555"> <name> Jane </name> </person> <personid="o456"> <name> Mary </name> <childrenidrefs="o123 o555"/> </person> <personid="o123" idref="o456"><name>John</name> </person> UDBS Part 4 - Winter 2001/2

  34. How to Edit XML Documents • A simple text editor is enough • Better are specific editors with (at least) well-formedness tests • Freeware and commercial tools include • GNU Emacs • Icon XML Spy • MS XML Notepad • ezDTD • XML Styler • Arbortext Epic • SoftQuad XMetaL UDBS Part 4 - Winter 2001/2

  35. Other Stuff • Comments<!-- this is a comment --> • Optional Document Header<?xml version="1.0" encoding="UTF-8"?> <!-- edited by Gottfried Vossen --> <!DOCTYPE CATALOG SYSTEM "catalog.dtd"> • Namespaces:“identify your vocabulary“ We‘ll get to this shortly! UDBS Part 4 - Winter 2001/2

  36. Namespaces • Serve to avoid name clashes when documents are composed from parts that originate from different sources • Map names to URIs (Universal Resource Identifiers) which identify a particular Namespace • A combination of local name + namespaceURI yields a unique name UDBS Part 4 - Winter 2001/2

  37. Sample Name Clash • Document 1:<book> Euro <price> 25.99 </price></book> • Document 2:<book><price currency="Euro"> 25.99 </price></book> • Document 3:<book><price currency="Euro" amount="25.99"/></book> UDBS Part 4 - Winter 2001/2

  38. defined here XML Namespaces • name ::= [prefix:]localpart • syntactically: <number> , <isbn:number> • semantically: provide URL for schema <tagxmlns:mystyle = “http://…”> … <mystyle:title> … </mystyle:title> <mystyle:number> … </tag> UDBS Part 4 - Winter 2001/2

  39. Namespaces can be Mixed <h:html xmlns:a=“http://www.article.com/article“ xmlns:h=“http://www.w3.org/TR/REC-html40“> <h:head><h:title>Articles about XML</h:title></h:head><h:body> <a:article> <a:title h:style=“font-family:arial;“>Namespaces</a:title> <h:table> <h:tr align=“left“> <h:td><a:author>Fritz Schnapp</a:author></h:td> <h:td><a:journal>XML-News</a:journal></h:td> <h:td><a:pages>11</a:pages></h:td> . . . . . . . </h:table> </a:article></h:body> </h:html> UDBS Part 4 - Winter 2001/2

  40. Linking • Recall from HTML:URLs exclusively point to documents; links are always uni-directional; external link definitions are not allowed • In XML:a document can be linked internally or externally • Technical tools: • Attribute types such as ID and IDREF • XPointer, XLink, XPath UDBS Part 4 - Winter 2001/2

  41. Linking and Addressing • XPath (XML Path Language):language for addressing parts of an XML document via paths • XPointer (XML Pointer Language):language using XPath for addressing into the internal structures of an XML document • XLink (XML Linking Language):constructs for describing links between XML objects as well as resources UDBS Part 4 - Winter 2001/2

  42. 1st step 2nd step Axes Node test Predicate Location Paths in XPath • General form: document(url)/step/step/.../step • Location steps have the form axis::nodetest[filter]* • Steps comprise an axis, a node test and a predicate • Example:child::AUTHOR[position()<3]/attribute::id UDBS Part 4 - Winter 2001/2

  43. Path Syntax • Valid axis values:child, attribute, parent, following-siblings • Node test:can be a tag, an attribute name or *; also allowed are functions like text() or comment() • More details in the context of XSLT later on UDBS Part 4 - Winter 2001/2

  44. Fragment Identifiers in XPointer • XPointer uses XPath for defining fragment identifiers • Such an identifier can be:the value of an ID-type attribute, a sequence of numbers, a sequence of XPointer expressions • Example:http://www.myserver.net/#xpointer(//BOOK/AUTHOR[position()=1]) UDBS Part 4 - Winter 2001/2

  45. XLink • For linking documents • Links can be simple or extended • XLink uses its own namespace • Extended links connect more than 2 documents (and can determine via “arcs“ how these links are traversed) UDBS Part 4 - Winter 2001/2

  46. Example of a Simple Link <AUTHOR xmlns:xlink=“http://www.w3.org/1999/xlink“xlink:type=“simple“xlink:href=“http://cs-faculty.stanford.edu/~knuth/“xlink:role=“don~knuth_homepage“xlink:show=“embed“xlink:actuate=“onLoad“> Donald Knuth </AUTHOR> What should happen with the document during loading? When should the action specified by “show“ occur? UDBS Part 4 - Winter 2001/2

  47. The XML Communication Problem How do I share structure and metadata with my community? How do I learn and use the element structure of a document? How to make all this automatable? UDBS Part 4 - Winter 2001/2

  48. Overview • Introduction to XML • DTDs and Schemas for XML Documents • Languages for XML, in particular XSL • Querying and Storing XML • Summary and Outlook UDBS Part 4 - Winter 2001/2

  49. Publ. A e-shopper‘sheaven Publ. B e-shopper‘s_heaven.com • How can e-shopper‘s motivate a publisher to supply its information in a uniform format to the heaven database ? UDBS Part 4 - Winter 2001/2

  50. Document Type Definition(DTD) • Syntax rules for one or more documents stating • what tags are allowed, • where these may occur, • how they fit together, • which attributes may be used. • An XML document is valid if it is well-formed and follows the rules of “its“ DTD UDBS Part 4 - Winter 2001/2

More Related