90 likes | 180 Views
Streaming XPath Engine. Oleg Slezberg Amruta Joshi. Overview. Motivation Querying Streaming XML XPath Challenges (predicates, //, nesting…) Basic Objective Comparative Analysis of Algorithms Implementation Implemented engine in Java using JDK 1.4.2
E N D
Streaming XPath Engine Oleg Slezberg Amruta Joshi
Overview • Motivation • Querying Streaming XML • XPath Challenges (predicates, //, nesting…) • Basic Objective • Comparative Analysis of Algorithms • Implementation • Implemented engine in Java using JDK 1.4.2 • Apache Xerces 2.6.2 for parsing (both XML and XPath) • Used existing XSQ Java implementation • Benchmark for evaluation - XPathMark
XStream • Builds parse tree for input query • Maintains an event stack • Keeps matching input streaming document for each node
Our Contributions • Correction – • Verification – • Performance Figures – • Recursive Query Handling – • Query Evaluation Support –
Performance • Benchmark: XPathMark, set of 23 queries (mostly predicate queries) • Criteria: Queries Per Second Rate • Test Setup: Run on elaine2, 900 MHz 2-CPU processor • Results: • XSQ QPS: 4.39 Coverage: 17% • TurboXPath QPS: 5.75 Coverage: 21%+ • Time = XML Parsing + Processing • QPS: XStream 30% faster + better coverage on given benchmark
Recursive Query Handling • For query node n and elements e1, e2 in d • Both e1 and e2 match n • e1 contains e2 • Example: • Document <a><a><b/></a><b></b></a> • Query //a/b • FA-based algorithms • Exponential number of states
Query Evaluation Support • 2 Questions: • Filtering • Does this document match the query? • F1: XML => boolean • Evaluation • What parts of the document match the query? • F2: XML => XML • Modifications: • Output buffers for predicate owner • Predicate node buffers • Predicate evaluation
Multiple Simultaneous Queries • combine the queries OR-ing them together: • q = (q1) | (q2) | … | (qn); • Resulting query has multiple output nodes • Associate a query-id with output node
Conclusion • Streaming XPath Engine • All Objectives met! (XPath Stream Evaluator implemented, Performance Analysis) • Algorithm correction and enhancements • Future Directions • Backward Axis Support • Function Support – reuse predicate evaluation model • Extended expression type support • Predicate Pipelining