340 likes | 489 Views
XML: Extensible Markup Language. FST-UMAC Gong Zhiguo. How the Web is Today. HTML documents all intended for human consumption many generated automatically by applications. Easy to fetch any Web page, from any server, any platform. Limits of the Web Today. Application cannot consume HTML
E N D
XML: Extensible Markup Language FST-UMAC Gong Zhiguo
How the Web is Today • HTML documents • all intended for human consumption • many generated automatically by applications Easy to fetch any Web page, from any server, any platform Gong Z.G.
Limits of the Web Today • Application cannot consume HTML • HTML wrapper technology is brittle • screen scraping • OO technology (Corba) requires controlled environment • Companies merge, form partnerships; need interoperability fast Gong Z.G.
Paradigm Shift on the Web • new Web standard XML: • XML generated by applications • XML consumed by applications • data exchange • across platforms: enterprise interoperability • across enterprises Web: from collection of documents to data and documents Gong Z.G.
XML • a W3C standard to complement HTML • origins: structured text SGML • motivation: • HTML describes presentation • XML describes content • http://www.w3.org/TR/REC-xml (2/98) Gong Z.G.
From HTML to XML HTML describes the presentation Gong Z.G.
HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteoul, Buneman, Suciu <br> Morgan Kaufmann, 1999 Gong Z.G.
XML <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> Gong Z.G.
XML Terminology • tags: book, title, author, … • start tag: <book>, end tag: </book> • elements: <book>…<book>,<author>…</author> • elements are nested • empty element: <red></red> abbrv. <red/> • an XML document: single root element well formed XML document: if it has matching tags Gong Z.G.
More XML: Attributes <bookprice = “55” currency = “USD”> <title> Foundations of Databases </title> <author> Abiteboul </author> … <year> 1995 </year> </book> attributes are alternative ways to represent data Gong Z.G.
Query Languages: Motivation • granularity of the HTML Web: one file • granularity of Web data varies: • single data item: “get John’s salary” • entire database: “get all salaries” • aggregates: “get average salary” • need query language to define granularity Gong Z.G.
XML-QL: A Query Language for XML • http://www.w3.org/TR/NOTE-xml-ql (8/98) • features: • regular path expressions • patterns, templates • Skolem Functions • based on OEM data model Gong Z.G.
Pattern Matching in XML-QL where <booklanguage=“french”> <publisher> <name> Morgan Kaufmann </name> </publisher> <author> $a </author> </book> in “www.a.b.c/bib.xml” construct $a Gong Z.G.
Simple Constructors in XML-QL where <booklanguage = $l> <author> $a </> </> in “www.a.b.c/bib.xml” construct <result> <author> $a </> <lang> $l </> </> Note: </> abbreviates </book> or </result> or ... <result> <author>Smith</author><lang>English</lang></result> <result> <author>Smith</author><lang>Mandarin</lang></result> <result> <author>Doe</author><lang>English</lang></result> Gong Z.G.
Schemas in XML • Document Type Definition (DTD) • XML Schema • RDF Schema Gong Z.G.
Document Type Definition: DTD • part of the original XML specification • an XML document may have a DTD • terminology for XML: • well-formed: if tags are correctly closed • valid: if it has a DTD and conforms to it • validation is useful in data exchange Gong Z.G.
DTDs as Grammars <!DOCTYPE paper [ <!ELEMENT paper (section*)> <!ELEMENT section ((title,section*) | text)> <!ELEMENT title (#PCDATA)> <!ELEMENT text (#PCDATA)> ]> <paper> <section> <text> </text> </section> <section> <title> </title> <section> … </section> <section> … </section> </section> </paper> Gong Z.G.
DTDs as Schemas Not so well suited: • impose unwanted constraints on order<!ELEMENT person (name,phone)> • references cannot be constrained • can be too vague: <!ELEMENT person ((name|phone|email)*)> Gong Z.G.
XML Storage • text file (XML) • store in ternary relation • use DTD to derive schema • mine data to derive schema • build special purpose repository (Lore) Gong Z.G.
XML Storage: Text File • advantages • simple • less space than one thinks • reasonable clustering • disadvantage • no updates • require special purpose query processor Gong Z.G.
Ref Val Store XML in Ternary Relation &o1 paper &o2 year title author author &o3 &o4 &o5 &o6 “…” “…” “1986” [Florescu, Kossman 1999] Gong Z.G.
Use DTD to derive Schema • DTD: • ODMG classes: • [Christophides et al. 1994 , Shanmugasundaram et al. 1999] • <!ELEMENT employee (name, address, project*)> • <!ELEMENT address (street, city, state, zip)> • class Employee publictypetuple • (name:string, address:Address, project:List(Project)) • class Address publictypetuple (street:string, …) Gong Z.G.
Paper1 paper paper paper paper year author title title author author author author title title ln fn fn ln fn fn ln ln Paper2 Mine Data to Derive Schema [Deutsch et al. 1999] Gong Z.G.
XML and Databases (1) • “Is XML a database?” • In a strict sense, no. • In a more liberal sense, yes, but … • XML has: • Storage (the XML document) • A schema (DTD) • Query languages (XQL, XML-QL, …) • Programming interfaces (SAX, DOM) • XML lacks: • Efficient storage, indexes, security, transactions, multi-user access, triggers, queries across multiple documents Gong Z.G.
XML and Databases (2) • Data versus Documents • There are two ways to use XML in a database environment: • Use XML as a data transport, i.e., to get data in and out of the database • Data is stored in a relational or object-oriented database • Middleware converts between the database and XML • Use a “native XML” database, i.e., store data in document form • Use a content management system Gong Z.G.
XML and Databases (3) • Data-centric documents • Fairly regular structure • Fine-grained data • Little or no mixed content • Order of sibling elements often not significant • Document-centric documents • Irregular structure • Larger-grained data • Lots of mixed content • Order of sibling elements is significant Gong Z.G.
XML and Databases (4) • Data-centric storage and retrieval systems • Use a database • Add middleware to convert to/from XML • Use an XML server (specialized product for e-commerce) • Use an XML-enabled web server with a database backend • Document-centric storage and retrieval systems • Content management system • Persistent DOM implementation Gong Z.G.
XML and Databases (5) • Mapping document structure to database structure • Template-driven • No predefined mapping • Embedded commands process (retrieve) data • Currently only available from RDBMS to XML <?xml version=“1.0”><FlightInfo> <Intro>The following flights have available seats:</Intro> <SelectStmt>SELECT Airline, FltNumber, Depart, Arrive FROM Flights</SelectStmt> <Conclude>We hope one of these meets your needs</Conclude></FlightInfo> Gong Z.G.
XML and Databases (6) • Template-driven - Example result: <?xml version=“1.0”><FlightInfo> <Intro>The following flights have available seats:</Intro> <Flights> <Row> <Airline>ACME</Airline> <FltNumber>123</FltNumber> <Depart>Dec 12, 2000, 13:43</Depart> <Arrive>Dec 13, 2000, 01:21</Arrive> </Row> </Flights> <Conclude>We hope one of these meets your needs</Conclude></FlightInfo> Gong Z.G.
XML and Databases (7) • Mapping document structure to database structure • Model-driven • A data model is imposed on the structure of the XML document • This model is mapped to the structures in the database • There are two common models: • Model the XML document as a single table or a set of tables • Model the XML document as a tree of data-specific objects (good for OODBMS mapping) Gong Z.G.
XML and Databases (8) • Single table or set of tables: <?xml version=“1.0”><database> <table> <row> <column1>...</column1> <column2>...</column2> ... </row> </table></database> • Tree organization: Orders | SalesOrder / | \Customer Item Item | | Part Part Gong Z.G.
XML and Databases (9) • Generating DTDs from a database schema and vice versa • Many times the DTD does not change often for an application and does not need to be automatically generated. • Some simple conversions are possible • Example: DTD from relational schema: • For each table, create an ELEMENT. • For each column in a table, create an attribute or a PCDATA-only child ELEMENT. • For each primary key/foreign key relationship in which a column of the table contributes the primary key, create a child ELEMENT. Gong Z.G.
XML and Databases (10) • Document-centric storage and retrieval systems • Content management system • Allows the storage of discrete content fragments, such as examples, procedures, chapters, as well as metadata such as author names, revision dates, etc. • Many content management systems are built on top of relational or object-oriented database systems. • Examples: • BladeRunner (Interleaf), SigmaLink (STEP), Parlance Content Manager (XyEnterprise),Target 2000 (Progressive Information Technology) • Persistent DOM implementation Gong Z.G.
Further Readings www. w3.org/XML www-db.stanford.edu/~widom www-rocq.inria.fr/~abiteboul db.cis.upenn.edu www.research.att.com/~suciu Abiteboul, Buneman, Suciu Data on the Web: From Relational to Semistructured to XML Morgan Kaufmann, 1999 (appears in October) Gong Z.G.