330 likes | 576 Views
II. XML Data Management. A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C: - Mapping XML data to databases - Native XML Data management. What is XML?.
E N D
II. XML Data Management A : XML refresher using material from A. Silverschatz and M. Sapossnek B: - XML-Data Management (1) Query languages: XPATH, XQuery, SQLX C: - Mapping XML data to databases - Native XML Data management
What is XML? • Acronym for eXtensible Markup Language • Syntax for structuring data and documents in human-readable form • THE "Syntax of the WEB" • Meta language for defining languages • Bases of many extensions • Namespaces • Stylesheets • Hyperlinks • Schemata • Standardized by W3Chttp://www.w3.org/TR/REC-xml HS / DBSII-03-XML-1
What XML is Not.. • No protocol • Language for describing data • Used as data format in protocols • Protocols may be syntactically defined by XML • No programming languagebut • XML documents may contain code fragments • New languages allow for XML – code as part of the language (Xen, a MS extension of C# ) • Some XML extensions with superimposed PL semantics,rule semantics in XSLT • No magic semantics • Interpretation by humans, applications, standards derived from XML HS / DBSII-03-XML-1
Why XML? • … not a question any more, since widely adopted • Simple • Extensible • Easy to process • Easy to generate • Data interchange critical for networked applications "XML will be the ASCII of the Web: basic, essential, unexciting" Tim Bray ... it is already HS / DBSII-03-XML-1
Prologue <?xml version="1.0"?> <PURCHASE_ORDER> <PO_NUM> PO-1234 </PO_NUM> <CUST_ID> CUST001 </CUST_ID> <ITEM ItemNum ="2"> < QUNTY > 2 </ QUNTY > <PRICE> 14.53 </PRICE> </ITEM> </PURCHASE_ORDER> Attribute Elements XML example • Pre-XML representation of data: • XML representation of the same data: “PO-1234”,”CUST001”,”X9876”,”5”,”14.98” HS / DBSII-03-XML-1
{ItemNum=X9876 } ITEM PRICE 2 14.53 XML example • Graphical representation PURCHASE_ORDER PO_NUM Cust:_ID PO-1234 CUST001 QUNTY XML documents - tree structured - Data an metadata in the same document (as opposed to RDBS) HS / DBSII-03-XML-1
XML Usage • Two basic types of XML usage Document centric (document oriented) • structuring a digital document, including logical layout • primary focus of SGML - predecessor of XML • Data centric • Description of data in a self describing form for later processing • Distinction not totally clear • See purchase order example: If typical document characteristic included (company addr.,customer addr, date, …, company logo) it would be a document oriented usage of XML HS / DBSII-03-XML-1
Document centric XML documents: example <Product> <Name>Variabler Maulschlüssel</Name> <Developer> Full Fabrication Labs, Inc. </Developer> <Summary> Großer, verstellbarer Schraubenschlüssel</Summary> <Description> <Para>Der Engländer besteht aus erstklassigem Stahl und besitzt einen gummierten Handgriff. Die Maulgröße liegt zwischen 0 und 32 mm. </Para> <Para>Sie können..... </Para> <List> <Item> <Link URL="Order.html"> Bestellen </Link></Item> <Item> <Link URL="Wrenches.htm"> Andere Werkzeuge ansehen</Link> </Item> <Item> <Link URL="catalog.zip"> Den Katalog herunterladen</Link> </Item> </List> <Para>Der Schraubenschlüssel kostet 15.33 Euro inkl. MWSt. Wenn Sie jetzt bestellen, erhalten Sie zusätzlich unsere wertlose Hobbybastler-Fibel.</Para> </Description> </Product> Typical:Long text elements HS / DBSII-03-XML-1
Data centric XML documents: example <Orders> <SalesOrder SONumber="12345"> <Customer CustNumber="543"> <CustName> ABC Industries</CustName> <Street> 123 Main St.</Street> <City>Chicago</City> .... </Customer> <Line LineNumber="1"> <Part PartNumber="123"> <Description> <p><b> Turkey wrench:</b><br /> Stainless steel, one-piece construction, lifetime guarantee.</p> </Description> <Price>9.95</Price> </Part> <Quantity>10</Quantity> </Line> ....... </SalesOrder> </Orders> HS / DBSII-03-XML-1
XML Syntax • One, and only one, root element • Sub-elements must be properly nested • A tag must end within the tag in which it was started • Attributes are optional • Attribute values must be enclosed in “” or ‘’ • No data type but 'string' • Processing instructions optional • XML is case-sensitive • <tag> and <TAG> are not the same type of element HS / DBSII-03-XML-1
Why hierarchical "data model"? • Hierachies (nesting) in data bases? Why not? • REDUNDANCY! Multiple items, customers, … occur multiple times in different orders Normalization replaces redundancies by foreign keys OO / OR – Data bases?? • Nesting useful in data transfer • External application does not have access to foreign key / to database. HS / DBSII-03-XML-1
XML Attributes vs Elements • Distinction between subelement and attribute • In the context of documents: • attributes are part of markup • subelement contents part of the basic document contents • In the context of data representation: difference not clear, but confusing • Same information can be represented in two ways • <account account-number = “A-101”> …. </account> • <account> <account-number> A-101 </account-number> … </account> • Suggestion: use attributes for identifiers of elements use subelements for contents HS / DBSII-03-XML-1
DBMS DBMS How to use XML data? • Basic Idea Applicationwith XML-Generator DOM SAX Receiving application XML-Parser Standard- Interfaces How does application know about - syntactical correctness - data semantics ? HS / DBSII-03-XML-1
Different encodings • specified by encoding attribute Correct or not correct ? HS / DBSII-03-XML-1
Correctness of XML documents • Syntactic correctness • Conformance to XML syntax • Document structured according to XML syntax is well-formed • Compare Syntax checker for program • Semantic correctness • Given Meta level description of XML documents:Document Type Definition (DTD) or XML Schema • Document is valid with respect to DTD (Schema) if all definitions and restrictions have been fulfilled • No DTD allowed, applications must know, what is meant • What is semantics?? • Interpretation of tags is a matter of humans and/or the application program: <xyz> could mean "book title" or "first name" or… HS / DBSII-03-XML-1
xmlns: bk = “http://www.example.com/bookinfo/” Namespace declaration Prefix URI (URL) XML Namespaces • Part of XML’s extensibility • Allow autonomous users to differentiate between tags of the same name (using a prefix) • Frees author to focus on the data and decide how to best describe it • Allows multiple XML documents from multiple authors to be merged HS / DBSII-03-XML-1
Namespace • Examples • No prefix: all elements belong to same namespace <BOOK xmlns:bk=“http://www.bookstuff.org/bookinfo”> <bk:TITLE>All About XML</bk:TITLE> <bk:AUTHOR>Joe Developer</bk:AUTHOR> <bk:PRICE currency=‘US Dollar’>19.99</bk:PRICE> <BOOK xmlns=“http://www.bookstuff.org/bookinfo”> <TITLE>All About XML</TITLE> <AUTHOR>Joe Developer</AUTHOR> HS / DBSII-03-XML-1
DTD and XML schema • Type of XML document defined as • DTD - not expressible in XML syntax • XML schema • Document Type Definition (DTD) • Does not constrain types: all values are strings in XML • Syntax <!ELEMENT elem (subelement-spec)> <!ATTLIST elem (attribute-specs) > HS / DBSII-03-XML-1
DTD: elements and attributes • Example (element decl) <!ELEMENT depositor (customer-name account-number)> <!ELEMENT customer-name (#PCDATA) > <!ELEMENT account-number (#PCDATA)> • Subelements • names of elements • #PCDATA (parsed character data), i.e., character strings • EMPTY (no subelements) or ANY (anything can be a subelement) • Subelement specification may have regular expressions <!ELEMENT bank ( ( account | customer | depositor)+)> • Notation: • “|” : alternatives • “+” : 1 or more occurrences "?" 0 or one • “*” : 0 or more occurrences HS / DBSII-03-XML-1
DTD example <!DOCTYPE bank [ <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (account-number branch-name balance)> <!ELEMENT customer (customer-name customer-street customer-city)> <!ELEMENT depositor (customer-name account-number)> <!ELEMENT account-number (#PCDATA)> <!ELEMENT branch-name (#PCDATA)> <!ELEMENT balance (#PCDATA)> <!ELEMENT customer-name (#PCDATA)> <!ELEMENT customer-street (#PCDATA)> <!ELEMENT customer-city (#PCDATA)> ]> HS / DBSII-03-XML-1
DTD attributes • Attribute specification : for each attribute • Name • Type of attribute • CDATA • ID (identifier) or IDREF (ID reference) or IDREFS • more on this later • Whether • mandatory (#REQUIRED) has a default value (value), • or neither (#IMPLIED) • Examples • <!ATTLIST account acct-type CDATA “checking”> • <!ATTLIST customer customer-id ID # REQUIRED accounts IDREFS # REQUIRED> HS / DBSII-03-XML-1
DTD attribute ID • At most one attribute of type ID per element • ID attribute value of each element in an XML document must be distinct • ID attribute value is object identifier • attribute of type IDREF must contain the ID value of an element in the same document • attribute of type IDREFS contains a set of (0 or more) ID values. ID value must contain the ID value of an element in the same document • ID, IDREF, IDREFS do not designate a particular domain (no type!) HS / DBSII-03-XML-1
DTD declaration External DTD-declaration<?xml version="1.0"><!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd"><bank> ... </bank> Internal DTD-declaration<!DOCTYPE custDesc [ <!ELEMENT custDesc (#PCDATA)> ]><custDesc> consumer rights protagonist </custDesc> Mixed usage<!DOCTYPE bank SYSTEM "http://www.x-ag.de/banks.dtd" [ <!ATTLIST bankDescr CDATA #REQUIRED>]><bank Descr=" mostly private customers and ATM"> ... </bank> HS / DBSII-03-XML-1
DTD limits • No typing of text elements and attributes • All values are strings, no integers, reals, etc. • Difficult to specify unordered sets of subelements • Order is usually irrelevant in databases • (A | B)* allows specification of an unordered set, but • Cannot ensure that each of A and B occurs only once • How to express: a, b and c in arbitrary order? <!ELEMENT a ((b,c,d) | (c,b,d) | (b,d,c), ...)> • IDs and IDREFs are untyped • The owners attribute of an account may contain a reference to another account, which is meaningless • owners attribute should ideally be constrained to refer to customer elements HS / DBSII-03-XML-1
XML Schema • XML Schema (XSD): much more expressible Schema language compared to DTD schemas • Typing of values • E.g. integer, string, etc • constraints on min/max values • User defined types • specified in XML syntax, unlike DTDs • More standard representation, but verbose • namespace support • Many more features • List types, uniqueness and foreign key constraints, inheritance Ability to map to RDB,… • significantly more complicated than DTD syntax • Use of XSD recommended HS / DBSII-03-XML-1
<xsd:schema xmlns:xsd=http://www.w3.org/2001/XMLSchema> <xsd:element name=“bank” type=“BankType”/> <xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType> </xsd:element> …..definitions of customer and depositor …. <xsd:complexTypename=“BankType”><xsd:squence> <xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xsd:sequence> </xsd:complexType> </xsd:schema> XSD example (from Silverschatz)
Using XML • Data exchange • Data management: • Store, retrieve, query large document sets efficiently • Today's solutions: • Mapping to RDB / ORDB / OODB • "Native" XML data management (not necessarily very different from storing in conventional DB) • Standardized data description: different extensions and applications • Bioinformatic Sequence Markup Language (BSML) • MathML • Scalable Vector Graphics (SVG).. And many, many more • Ressource Description in the web (RDF) … HS / DBSII-03-XML-1
fritz@web.de emailOf Encoded in XML: <?xml version="1.0"?> <RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:s="http://description.org/schema/"> <Description about="http://www.me.de/~fritz"> <s:Creator>Fritz Müller</s:Creator> </Description> <Description about=fritz@web.de> <s:emailOf> Fritz Müller </s:emailOf> </Description> </RDF> Using XML: RDF with XML syntax RDF-Modell www.me.de/~fritz Homepage Fritz Müller Creator Many of these triples form a graph HS / DBSII-03-XML-1
XML-Doc.(Layout-transf.) XML-Doc.(device spec. Layout) Standard Software(HTML-Browser) Standard-Software(XSL-Processor) XML-Doc.(Daten) Using XML • Layout of documents? • XML documents have logical structure • Layout structure needed for output • Use transformation language to describe device specific transformations Transformation into all kinds of languages (HTML, pdf, …) on all kinds of devices HS / DBSII-03-XML-1
XML transformation • XSLT: The language used for converting XML documents into other forms • Describes how the document is transformed • Expressed as an XML document (.xsl) • Template rules • Patterns match nodes in source document • Templates instantiated to form part of result document • XPath for querying, sorting, etc. • XSL-FO language for describing layout XSL = XSLT + XPATH + XSL-FO HS / DBSII-03-XML-1
XML transformation: example (1) • Document <sales> <summary> <heading>Scootney Publishing</heading> <subhead>Regional Sales Report</subhead> <description>Sales Report</description> </summary> <data> <region> <name>West Coast</name> <quarter number="1" books_sold="24000" /> <quarter number="2" books_sold="38600" /> <quarter number="3" books_sold="44030" /> <quarter number="4" books_sold="21000" /> </region> ... </data> </sales> HS / DBSII-03-XML-1
XML transformation: example (2) • XSL style sheet - mapping to HTML <xsl:param name="low_sales" select="21000"/> <BODY> <h1><xsl:value-of select="//summary/heading"/> </h1> ... <table><tr><th>Region\Quarter</th> <xsl:for-each select="//data/region[1]/quarter"> <th>Q<xsl:value-of select="@number"/></th> </xsl:for-each> ... <xsl:for-each select="//data/region"> <tr><xsl:value-of select="name"/></th> <xsl:for-each select="quarter"> <td><xsl:choose> <xsl:when test="number(@books_sold <= $low_sales)"> color:red;</xsl:when> <xsl:otherwise>color:green;</xsl:otherwise></xsl:choose> <xsl:value-of select="format-number (@books_sold,'###,###')" /> </td> ... <td><xsl:value-of select="format-number(sum(quarter/@books_sold), '###,###')"/> XPath expression XPath: query language on doc trees HS / DBSII-03-XML-1
XML transformation: example (2) • The result HS / DBSII-03-XML-1