590 likes | 755 Views
數位圖書館 – XML 系統應用. Jian-hua Yeh ( 葉建華 ) 真理大學資訊科學系助理教授 au4290@email.au.edu.tw. Outline. XML language introduction XML server architecture XML query language design issues. XML Introduction. What is XML? Why XML? The XML power XML and the enterprise. What is XML?.
E N D
數位圖書館 –XML系統應用 Jian-hua Yeh (葉建華) 真理大學資訊科學系助理教授 au4290@email.au.edu.tw
Outline • XML language introduction • XML server architecture • XML query language design issues
XML Introduction • What is XML? • Why XML? • The XML power • XML and the enterprise
What is XML? • Proposed by W3C at the end of 1996 • SGML-derived • A meta-language for new tagging language • XML1.0 Recommendation released at Feb. 1998 • Supporting • Sun, Microsoft, Netscape, Adobe, ArborText, etc.
What is XML? (2) • eXtensible Markup Language • Tag-based • Open and cross-platform • Structural data representation • As data and as document • Suitable for data exchange
<?xml version="1.0"?> <invoicecollection> <invoice> <customer> Wile E. Coyote, Death Valley, CA </customer> <annotation> Customer asked that we guarantee return rights if these items should fail in desert conditions. This was approved by Marty Melliore, general manager. </annotation> <entries n=2> <entry quantity=2 total_price="134.00"> <product maker="ACME" prod_name="screwdriver" price="80.00"/> </entry> <entry quantity=1 total_price="20.00"> <product maker="ACME" prod_name="power wrench" price="20.00"/> </entry> </entries> </invoice> <invoice> <customer> Camp Mertz </customer> <entries n=2> <entry quantity=2 total_price="32.00"> <product maker="BSA" prod_name="left-handed smoke shifter" price="16.00"/> </entry> <entry quantity=1 total_price="13.00"> <product maker="BSA" prod_name="snipe call" price="13.00"/> </entry> </entries> </invoice> </invoicecollection>
Why XML? • HTML is not enough, no structural data handling capability • Recommended by W3C, an open standard • The push of enterprise integration • To break the stovepipe system, from vertical to horizontal • The need of B2B, B2C integration • Platform independent
Traditional Data Exchange Handling • Private protocol for stovepipe system • Open standard for data exchange • RPC • RMI • CORBA • COM
New Strategy of Data Exchange • Text-based • Tag-oriented • Self-descriptive • Data Type Definition
XML Details • Components • DTD • XML content • Processing models • Event driven model: SAX • A document is treated as a set of events • Structural model: DOM • A document is represented as a tree structure
XML Server Introduction • Why XML server? • Comply with enterprise service model: client/middle/EIS structure • Common components can consists of 3rd party software vendors • XML parser, XSL processor, etc.
XML Server Architecture (2) • Key aspects • Client • PDA, browser, Web server, other XML server, etc. • Communication protocol • Email, HTTP, FTP, EJB, RMI, IIOP, COM, etc. • Key services • Data object • Relational database, object data source, etc.
XML Server Components • Client • Communication service • Document handler • Data object access module • XML core service
XML support in Java technology • XML processing • Data binding • Remote communication • Service registry • Messaging
Java for XML Processing • JAXP (Java API for XML Processing) • SAX (Simple API for XML) parser • Event-based XML parsing • DOM (Document Object Model) parser • Model-based XML parsing • XSLT (XML Stylesheet Language for Transformations) processor • Support SAX, DOM, stream-specific processing
Java for XML Data Binding • JAXB (Java Architecture for XML Binding) • Schema-based • Validation • Representing XML content
Java for XML Communication • JAX-RPC (Java API for XML-based RPC) • RPC-based Web service • SOAP-based (Simple Object Access Protocol) • Discoverable by using JAXR (*later*)
Java for XML Registries • JAXR (Java API for XML Registries) • Service registration • Service lookup
Java for XML Messaging • JAXM (Java API for XML Messaging) • Message provider • SAAJ (SOAP with Attachments API for Java) • Message population with attachment
XML Processing, How? • Locating: XPath • Querying: XQL, XQuery • Storage: XMLDB
What is XPath? • W3C standard • A syntax for defining parts of an XML document • Uses paths to define XML elements • Defines a library of standard functions • A major element in XSLT
Sample XML • Path • /catalog/cd/price • Function • /catalog/cd[price>10.80] <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
Path Syntax: Locating Nodes • /catalog/cd/price • //cd • /catalog/cd/* • /catalog/*/price • /*/*/price • //* <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
Path Syntax: Selecting Branches • /catalog/cd[1] • /catalog/cd[last()] • /catalog/cd[price] • /catalog/cd[price=10.90] • /catalog/cd[price=10.90]/price <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
Path Syntax: Selecting Several Paths • /catalog/cd/title | /catalog/cd/artist • //title | //artist • //title | //artist | //price <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
Path Syntax: Selecting Attributes • //@country • //cd[@country] • //cd[@*] • //cd[@country='UK'] <?xml version="1.0" encoding="ISO-8859-1"?> <catalog> <cd country="USA"> <title>Empire Burlesque</title> <artist>Bob Dylan</artist> <price>10.90</price> </cd> <cd country="UK"> <title>Hide your heart</title> <artist>Bonnie Tyler</artist> <price>9.90</price> </cd> <cd country="USA"> <title>Greatest Hits</title> <artist>Dolly Parton</artist> <price>9.90</price> </cd> </catalog>
Formal Syntax • axisname::nodetest[predicate] • child::price[price=9.90]
Expression Types • Numerical expressions • Equality expressions • Relational expressions • Boolean expressions
XPath Function Library • Node Set Functions • String Functions • Number Functions • Boolean Functions
XQL: XML Query Language • XQL problem domains • Queries, search contexts, and result sets • Result sets vs. result documents
XQL Introduction • Developers • Texcel, webMethods, Microsoft • Traditional query processing • Features of XML documents
Traditional Query Processing • Structured query • For relational database • For object-oriented database • Unstructured full-text query • For text documents
Features of XML Documents • As documents • As data sources • With structure feature