200 likes | 362 Views
What can ONE instruction do? Enabling your XML applications efficiently processing GigaByte-level XML documents!!. XML Evolution: Two-phase XML Processing Model Using XML Prefiltering Techniques. VLDB 2006 September 12-15, 2006
E N D
What can ONE instruction do?Enabling your XML applications efficiently processing GigaByte-level XML documents!!
XML Evolution:Two-phase XML Processing ModelUsing XML Prefiltering Techniques VLDB 2006 September 12-15, 2006 Chia-Hsin Huang, Tyng-Ruey Chuang, James J. Lu, and Hahn-Ming Lee
How much do you know about DOM and SAXin XML processing? They can not process Large XML documents efficiently!!
DOM Processing Model XPath expression: /html/body/ul/li/text the Source: http://www.cee.hw.ac.uk/~alison/ netapp/dom/sld006.htm
SAX Processing Model XPath expression: //entry[@id=“a2”] Source: http://www.informatik.hu-berlin.de/~obecker/Lehre/SS2002/XML/images/sax.t.gif
Problems in Standard DOM and SAX Processing Models • Both DOM and SAX processing models waste a large amount of computational resources by processing uninteresting fragments. • They may not be able to query Large XML documents efficiently. • (Size of a DOM tree) : (Size of the XML doc.) = 5 : 1 • SAX cannot parse a document in a random access manner • No backtrack mechanisms (look forward parsing) • Lack of interactive mechanisms
XML Processing Enhancements XML Applications • Unchangeable? • or a few modifications! Requirements? XML Standards • Unchangeable!?
Issues in ExistingXML Processing Enhancements • Consume large amount of disk/memory space and CPU time (Cost: $) • Large-scale (Cost: $) • Integrate with relational database (Cost: $$$) • Complicated index/query algorithms (Cost: $$$$) • Intrusive (considerable modifications) (Cost: $$$$$) • Non-transparent (apps. need to be aware of the mechanics) (Cost: $$$$$)
The Simplest Solution: Two-phase XML Processing Models using XML Prefiltering Techniques
XML Prefiltering Technique The Solution XPath Expression (Issued by users’ apps.) Prefiltering Techniques (A tiny search engine) Candidate-setXML document XML Parsers (DOM/SAX) XML document
Two-phase XML Processing Model – Enhanced DOM-based Applications
Two-phase XML Processing Model – Prefiltering XPath Processor
Two-phased XML Processing Model – Enhanced SAX-based Applications
Two-phase XML Processing Model – Stream-based XPath Processor
Characteristics of the XML Prefiltering Technique • Correct • Small-scale • Lightweight • Efficient • Transparent • Non-intrusive? • User applications or XML processors require adding Few (one or two) instructions
Demonstrations http://www.iis.sinica.edu.tw/~jashing/prefiltering
Demo Items • System modules (http://www.iis.sinica.edu.tw/~jashing/prefiltering/Download.htm) • Indexer • Query Simplifier • Fast Lightweight Steps-Axes Analyzer • Fragment Gatherer • Micro XML Streaming Parser (an interactive streaming parser) • An Application (http://www.iis.sinica.edu.tw/~jashing/prefiltering/Applications.htm) • GML-based Web GIS (Chia-Hsin Huang, Tyng-Ruey Chuang, Dong-Po Deng, and Hahn-Ming Lee, "Efficient GML-native Processors for Web-based GIS: Techniques and Tools," to appear in the proc. of ACM-GIS'06)