500 likes | 642 Views
XML-Based Information Systems. National Cheng Kung University Department of Electrical Engineering DSLab Shang-Rong Tsai. Outline. Background XML-based databases and information systems An XML-based Information Server. Background. What is XML? Is XML a Database?
E N D
XML-Based Information Systems National Cheng Kung University Department of Electrical Engineering DSLab Shang-Rong Tsai
Outline • Background • XML-based databases and information systems • An XML-based Information Server
Background • What is XML? • Is XML a Database? • What is an XML Database? • What is the goal of XML Database? • What is the difference between RDB and XDB? • XML in the Web
What is XML? • XML stands for eXtensible Markup Language • XML is a textual encoding system for describing structured documents • HTML documents are SGML documents which conform to the HTML DTDs • DTDs (Document Type Definitions) are the syntax defined in SGML to describe the tag structure for a particular type of document.
What is XML? (cont.) • XML is a subset of Standard Generalized Markup Language (SGML) defined by the World Wide Web Consortium (Use only 10% of SGML to express 90% power of SGML) • HTML is for presentation only • XML allows developers to define their own markup languages to express their information more meaningfully • XML lets developers describe, deliver and exchange structured data between applications, including Web servers and browsers.
The features of XML • Extensible • Self-described • Separate data from presentation • Text based, platform neutral • Unified if confirm to schema of specific domain • Integration
XML Technologies • XML/DTD • XML Namespaces • XSL/XSLT • XLink/XPointer/XPath • XML Schema • XML data query • XHTML
XML and Database • XML is basically a data format, we still need persistent store • Lots of the information on the Web come from databases • Data model of XML and RDBMS / OODBMS • XML mismatches with relational databases
XML and Database (cont.) • Schema mapping between XML documents and RDBMS • data unit as XML document/element/attribute • keys for relational tables • data type mapping • relationship between the stored tables
XML and Database (cont.) • Query/update languages • Indexing and search • A new database system for XML ? • XML-enabled database. • native XML database (the data is actually stored as XML internally)
Is XML a Database? • Something similar • data storage (XML documents) • DTD/Schema • Query languages (XQuery, XPath, XQL, XML-QL, QUILT, etc.) • Programming interface (DOM/SAX)
Is XML a Database? (cont.) • Something it lacks • transaction • security • indexing • concurrent access • query from multiple data objects • data integrity
What is an XML Database? • Databases that store XML documents and provide a view of operational data, generally either as indexed text or as some variant of the DOM mapped to an underlying data store.
The Goal of XML Database • Solve the problem of mismatches between the XML-structure data and data model RDB products support • Provide a complete solution for storing, accessing and manipulating XML documents • Make the data integration and exchange easier • Support the original goal of Web • Human communication through shared knowledge • The universe of network-accessible information • More meaningful and clear to represent data (than HTML)
Difference between RDB and XDB • Data • Table vs. XML documents • Modeling • Logical Model • Entity-Relationship vs. XML model • Physical Model • Interface • SQL vs. XQuery • Application • Transaction-based vs. Document-based
Storing and Retrieving XML Documents • File System • BLOB (Binary Large OBject) • Native XML Databases • Persistent DOMs (PDOMs) • Content Management Systems • Systems for managing fragments of human-readable documents and include support for editing, version control, and building new documents from existing fragments.
Data oriented vs. Document oriented • Data oriented • Documents that use XML as a data transport • Designed for machine consumption • Regular structure, fine-grained data, little or no mixed content • Document oriented • Designed for human consumption • Irregular structure, larger grained data , lots of mixed content
Data oriented Document oriented <invoice> <orderDate>1999-01-21</orderDate> <shipDate>1999-01-25</shipDate> <billingAddress> <name>Ashok Malhotra</name> <street>123 Microsoft Ave.</street> <city>Hawthorne</city> <state>NY</state> <zip>10532-0000</zip> </billingAddress> <voice>555-1234</voice> <fax>555-4321</fax> </invoice> <memo importance='high' date='1999-03-23'> <from>Paul V. Biron</from> <to>Ashok Malhotra</to> <subject>Latest draft</subject> <body> We need to discuss the latest draft <emph>immediately</emph>. Either email me at <email> mailto:paul.v.biron@kp.org</email> or call <phone>555-9876</phone> </body> </memo> Two typical examples of XML instances
Taxonomy of XML Database • Native XML Database (NXD) • A database fundamentally designed to store and manipulate XML data. • Defines a (logical) model for an XML document and stores and retrieves documents according to that model. • Has an XML document as its fundamental unit of (logical) storage, just as a relational database has a row in a table as its fundamental unit of (logical) storage. • It is NOT required to have any particular underlying physical storage model. • XML Enabled Database (XEDB) • A database that has an added XML mapping layer provided either by the database vendor or a third party
Applications of XML Database • Corporate information portals • Membership databases • Product catalogs • Parts databases • Patient information tracking • Business to business document exchange
Some related standard • W3C • XML Schema • XPath • XQuery • XMLDB ORG • XML:DB API • XUpdate
XML Schema • The purpose of a schema is to define a class of XML documents, and so the term "instance document" is often used to describe an XML document that conforms to a particular schema.
XML Schema • XML Schema is todefine and describea class of XML documents by using [schema] constructs toconstrain and document themeaning, usage and relationships of their constituent parts. • Structure • Data type
<<?xml version="1.0" encoding="Big5"?> <<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://chip.ee.ncku.edu.tw/buy> <xsd:element name="Message"> <xsd:complexType> <xsd:sequence> <xsd:element name="request" type="xsd:string"/> <xsd:element name=“name" type="xsd:string"/> <xsd:element name=“telephone" type="phoneType" maxOccurs="1"/> <xsd:element name="buyitem" type="buyitemType" minOccurs="1" maxOccurs="unbounded"/> <xsd:element name="rule" type="xsd:string" maxOccurs="unbounded"/> </xsd:sequence> </xsd:complexType> </xsd:element> <!– Definition buy items --> <xsd:complexType name="buyitemType"> <xsd:simpleContent> <xsd:restriction base="xsd:string"> <xsd:attribute name="num" type="xsd:positiveInteger" use="required"/> </xsd:restriction> <xsd:simpleContent> </xsd:complexType> <!– definition of telephone type --> <xsd:simpleType name="phoneType"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{2}-\d{7}"/> </xsd:restriction> </xsd:simpleType> <</xsd:schema> An Example of XML Schema
XPath • The primary purpose of XPath is to address parts of an XML document. • XPath is also designed so that it has a natural subset that can be used for matching. • XPath models an XML document as a tree of nodes. • Element nodes • Attribute nodes • Text nodes
Examples of XPath • Collections –‘element’ and ‘.’ • ./first-name • Selecting children and descendants –‘/’ and ‘//’ • author/first-name • bookstore//title • Collecting element children –‘*’ • author/* • book/*/last-name • Finding an attribute –‘@’ • @style • price/@exchange
XQuery • A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware.
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price> 65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib> The XML data Used in the XQuery example
FOR $p IN distinct(document("bib.xml")//publisher) LET $a := avg(document("bib.xml") /book[publisher = $p]/price) WHERE $a > 100 RETURN <publisher> <name> $p/text() </name> , <avgprice> $a </avgprice> </publisher> An example of XQuery List each publisher and the average price which is greater than 100 of its books
XML:DB API • XML:DB API is being developed by the XML:DB Initiative to facilitate the development of applications that function with minimal change on more then one XML database. • This is roughly equivalent to the functionality provided by JDBC or ODBC for providing access to relational databases.
XUpdate • XUpdate is a specification under development by the XML:DB Initiative to enable simpler updating of XML documents. • XUpdate gives you a declarative method to insert nodes, remove nodes, and change nodes within an XML document.
Some XML database products • Commercial • Tamino • X-Hive • Excelon • Open Source (All Java based) • Xindice (dbXML Core) • eXist • Ozone
The Original Goal of Web • Human communication thru shared knowledge. Working together: • Social efficiency, understanding and scaling • The Universe of network-accessible information
The problems of Current Web • HTML is for presentation only • Not agent and search engine friendly • Web Automation is difficult • Enter, search and click… • Integration is difficult • Data format is not unified and extensible
Features of the System • A large scale Information Server based on XML technologies • Tools for data input, query and presentation • Information/documents sharing, exchange and integration • Multimedia contents support • Document-oriented • Systematic way for retrieve useful and precise information • As an XML data storage for specific XML-based applications
The Input form generated by the Data Capture Template Processor
Epilogue • XML makes the web more automatic. • More and more Internet applications use XML technology • Information sharing using XML would be more effective than HTML approach • XML can describe data in a more appropriate way than using Relational model • XML plays an important role in the database area. More efforts are devoted to XML based database developments.
Reference • XML Database Overview • Oasis: XML and Databases, http://www.oasis-open.org/cover/xmlAndDatabases.html • XML and Database, http://www.rpbourret.com/xml/XMLAndDatabases.htm • Programming • Java XML Tutorial, http://java.sun.com/xml/tutorial_intro.html • Java World, http://www.javaworld.com • http://xml.apache.org • http://jakarta.apache.org