1 / 59

MonetDB/XQuery Technology Preview 1

Stefan Manegold CWI Amsterdam http://monetdb.cwi.nl/ - http://pathfinder-xquery.org/. MonetDB/XQuery Technology Preview 1. Stefan Manegold. MonetDB/XQuery. HollandOpen, Amsterdam 31-5-2005. European Pathfinder Team. University of Konstanz (Germany)

laasya
Download Presentation

MonetDB/XQuery Technology Preview 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stefan Manegold CWI Amsterdam http://monetdb.cwi.nl/ - http://pathfinder-xquery.org/ MonetDB/XQueryTechnology Preview 1

  2. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 European Pathfinder Team • University of Konstanz (Germany) • Torsten Grust, Jens Teubner, Jan Rittinger • University of Twente (Netherlands) • Maurice van Keulen, Jan Flokstra • CWI, Amsterdam (Netherlands) • Peter Boncz, Stefan Manegold, Sjoerd Mullender

  3. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  4. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Results: Performance (1)

  5. did not finish Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Results: Performance (2)

  6. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Story • XQuery Example • Relational XQuery • System Architecture • XML Encoding • Science & Reseach • Scalability • Outlook • Conclusions • Roadmaps • Release & References

  7. XQuery Example Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 • For each author, return number of books and receipts • for books published in the past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml”), (:Documents:) $sales := fn:doc(“www.publishersweekly.com/sales.xml”) for $author in distinct-values($cat//author) (:Grouping:) let $books := $cat//book[@year >= 2003 and author = $a], (:Sel.:) $receipts := $sales/book[@isbn = $books/@isbn]/receipts (:Join:) order by $author (:Ordering:) return <sales> (:XML Construction:) { $author } <count> { fn:count($books) } </count> (:Aggregation:) <total> { fn:sum($receipts) } </total> </sales>

  8. XQuery Example Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 • For each author, return number of books and receipts • for books published in the past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml”), Documents $sales := fn:doc(“www.publishersweekly.com/sales.xml”) for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2003 and author = $a], Sel. $receipts := $sales/book[@isbn = $books/@isbn]/receipts Join order by $author Ordering return <sales> XML Construction { $author } <count> { fn:count($books) } </count> Aggregation <total> { fn:sum($receipts) } </total> </sales>

  9. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery Systems: 2 Approaches • Existing “native” XML/XQuery systems are built from scratch • Galax, Saxon, … • X-Hive, Tamino, … • (Still have to) re-invent optimization technology • Our approach: • Build XQuery system on top of an RDBMS • Leverage mature relational technology to achieve efficient XQuery processing

  10. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Architecture

  11. f/descendant: SELECT * FROM pre_post WHERE pre > f.pre AND post < f.post f/ancester: SELECT * FROM pre_post WHERE pre < f.pre AND post > f.post f/preceeding: SELECT * FROM pre_post WHERE pre < f.pre AND post < f.post f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post Node-based relational encoding of XQuery's data model Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XML in an RDBMS: XPath Accelerator

  12. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Science & Research • More research lead to more optimization • Join Recognition • Embedded XPath processing • Order Awareness • Various scientific publications

  13. Unsurpassed scalability • Standard Opteron PC, 8GB RAM, 64-bit Linux • Can process 11GB documents! Mostly linear scaling with document size Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Results: Scalability (3)

  14. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Conclusions • Relational approach • Works • Is fast • Is scalable • Crucial Optimizations • Join recognition • Embedded XPath processing • Order awareness

  15. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Roadmap • 30-05-05: MonetDB/XQuery 4.8/0.8 “Mercurius” • Developers Release / Technology Preview 1 • 30-09-05: MonetDB/XQuery 4.10/0.10 “Venus” • Student Release / Technology Preview 2 • XUpdate, Algebraic Query Optimization • 30-12-05: MonetDB/XQuery 4.12/1.12 “Mars” • Final Release • Application Programming Interfaces • End-User Front-Ends

  16. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Open Source Release • MonetDB + Pathfinder on SourceForge • Mozilla-like License • MonetDB homepage • http://monetdb.cwi.nl/ • Pathfinder homepage • http://pathfinder-xquery.org/ • Developers website • http://sf.net/projects/monetdb/

  17. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  18. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  19. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  20. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  21. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  22. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005

  23. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps

  24. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps

  25. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XML • Standard, flexible syntax for data exchange • Regular, structured data Database content of all kinds: Inventory, billing, orders, … “Small” typed values • Irregular, unstructured text Documents of all kinds: Transcripts, books, legal briefs, … “Large” untyped values • Lingua franca of B2B Applications… • Increase access to products & services • Integrate disparate data sources • Automate business processes • … and numerous other application domains • Bio-informatics, library science, …

  26. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XML : A First Look • XML document describing catalog of books <?xml version="1.0" encoding="ISO-8859-1" ?> <catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving account of one man's successful battles against three cancers ...<title>No Such Thing as a Bad Day</title> is warmly recommended. </review> </book> <!-- more books and specifications --> </catalog>

  27. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery 1.0 • Functional, strongly-typed query language • XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules + Strong typing

  28. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XSLT vs. XQuery • XSLT 1.0: XML  XML, HTML, Text • Loosely-typed scripting language • Format XML in HTML for display in browser • Must be highly tolerant of variability/errors in data • XQuery 1.0: XML  XML • Strongly-typed query language • Large-scale database access • Must guarantee safety/correctness of operations on data • Over time, XSLT & XQuery may both serve needs of many application domains • XQuery will become a hidden, commodity language

  29. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery Example • For each author, return number of books and receipts books published in past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml“), Join $sales := fn:doc(“www.publishersweekly.com/sales.xml“) for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2000 and author = $a], S.J. $receipts := $sales/book[@isbn = $books/@isbn]/receipts order by $author Ordering return <sales> XML Construction { $author } <count> { fn:count($books) } </count> Aggregation <total> { fn:sum($receipts) } </total> </sales>

  30. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps

  31. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery Systems: 2 Approaches • Tree-based • Tree is basic data structure • Also on disk (if an XQuery DBMS) • Navigational Approach • Galax [Simeon..], Flux [Koch..], X-Hive • Tree Algebra Approach • TIMBER [Jagadish..] • Relational • Data shredded in relational tables • XQuery translated into database query (e.g. SQL)

  32. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 The Pathfinder Project • Challenge / Goal: • Turn RDBMSs into efficient XQuery engines • People: • Torsten Grust, Jens Teubner • University of Konstanz (June 2005: Technical University of Munich) • Maurice van Keulen • University of Twente • Jan Rittinger • University of Konstanz & CWI • Task: generate code for MonetDB

  33. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 The Pathfinder Project • Challenge / Goal: • Turn RDBMSs into efficient XQuery engines • People: • Torsten Grust, Jens Teubner, ... • University of Konstanz (June 2005: Technical University of Munich) • Maurice van Keulen, Jan Flokstra, ... • University of Twente • Jan Rittinger • University of Konstanz & CWI • Task: generate code for MonetDB • Peter Boncz, Stefan Manegold, Sjoerd Mullender, ... • CWI, Amsterdam

  34. MonetDB: Applied CS Research at CWI Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 • a decade of “query-intensive” application experience • image retrieval: Peter Bosch  ImageSpotter • audio/video retrieval: Alex van Ballegooij RAM • XML text retrieval: de Vries / Hiemstra TIJAH • biological sequences: Arno Siebes  BRICKS • XML databases: Albrecht Schmidt  XMark Grust / vKeulen  Pathfinder • GIS: Wilco Quak  MAGNUM • data warehousing / OLAP / data mining SPSS  DataDistilleries Univ. Massachussetts  PROXIMITY CWI research group successfully spun off DataDistilleries (now SPSS)

  35. Pathfinder Parser Parser Sem. Analysis Sem. Analysis Core Translation Core Translation SQL Typechecking Typechecking MIL MIL Relational Algebra Database Database MonetDB Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Pathfinder — MonetDB

  36. Pathfinder Parser Parser Sem. Analysis Sem. Analysis Core Translation Core Translation SQL Core to MILTranslation Typechecking Typechecking MIL MIL Relational Algebra Database Database MonetDB Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Pathfinder — MonetDB

  37. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Open Source • MonetDB + Pathfinder on Sourceforge • Mozilla License • MonetDB Homepage • http://monetdb.cwi.nl/ • Pathfinder Homepage • http://pathfinder-xquery.org/ • Developers website: • http://sf.net/projects/monetdb/ RoadMap • 14-apr-04: initial Beta release MonetDB/SQL • 30-sep-04: first official release MonetDB/SQL • 30-may-05: Developer release of MonetDB/XQuery (i.e. Pathfinder) • 30-sep-05: Student release of MonetDB/XQuery (incl. XUpdate) • 30-dec-05: Users release of MonetDB/XQuery (?)

  38. MonetDB: extensible architecture Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Front-end/back-end: • support multiple data models • support multiple end-user languages • support diverse application domains

  39. MonetDB: extensible architecture Pathfinder XQuery Frontend Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Front-end/back-end: • support multiple data models • support multiple end-user languages • support diverse application domains

  40. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Architecture

  41. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 The Architecture

  42. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps

  43. Node-based relational encoding of XQuery's data model Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XPath on an RDBMS

  44. done for better skipping and updates Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Pre/Post  Pre/Level/Size

  45. Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps

  46. [VLDB03 Wang&Cherniack] define: • Order properties of relations • Order propagation rules for relational operators Decoration of physical plans with order properties  eliminate sort Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Order Prevention

  47. for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t • For loop  map with all combinations  O(N*N) • If `simple’ condition exist on two loop variables  join • Only make a map with the matching combinations • E.g. with Hash-Table  O(N) Performed on the XCore tree Recognize if-then expressions Open question: where to optimize best?? Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Join Recognition

  48. Staircase join [VLDB03]: • Single-pass for a *set* of context nodes Loop-lifting multiple iters  multiple sets of context nodes • elaborate skipping! • Loop-Lifted Staircase Join In a single pass: process multiple input context node lists • Use a stack • Exploit axis properties for pruning Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Loop-lifted staircase join

  49. Test platform • Opteron 1.6GHz, 8GB RAM, 64-bit Linux (Fedora Core 3) • Can process 11GB document! Mostly linear scaling with document size Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Scalability

  50. Test platform • Opteron 1.6GHz, 8GB RAM, 64-bit Linux (Fedora Core 3) • Can process 11GB document! Mostly linear scaling with document size • Some swapping in the join queries Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Scalability

More Related