590 likes | 700 Views
Stefan Manegold CWI Amsterdam http://monetdb.cwi.nl/ - http://pathfinder-xquery.org/. MonetDB/XQuery Technology Preview 1. Stefan Manegold. MonetDB/XQuery. HollandOpen, Amsterdam 31-5-2005. European Pathfinder Team. University of Konstanz (Germany)
E N D
Stefan Manegold CWI Amsterdam http://monetdb.cwi.nl/ - http://pathfinder-xquery.org/ MonetDB/XQueryTechnology Preview 1
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 European Pathfinder Team • University of Konstanz (Germany) • Torsten Grust, Jens Teubner, Jan Rittinger • University of Twente (Netherlands) • Maurice van Keulen, Jan Flokstra • CWI, Amsterdam (Netherlands) • Peter Boncz, Stefan Manegold, Sjoerd Mullender
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Results: Performance (1)
did not finish Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Results: Performance (2)
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Story • XQuery Example • Relational XQuery • System Architecture • XML Encoding • Science & Reseach • Scalability • Outlook • Conclusions • Roadmaps • Release & References
XQuery Example Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 • For each author, return number of books and receipts • for books published in the past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml”), (:Documents:) $sales := fn:doc(“www.publishersweekly.com/sales.xml”) for $author in distinct-values($cat//author) (:Grouping:) let $books := $cat//book[@year >= 2003 and author = $a], (:Sel.:) $receipts := $sales/book[@isbn = $books/@isbn]/receipts (:Join:) order by $author (:Ordering:) return <sales> (:XML Construction:) { $author } <count> { fn:count($books) } </count> (:Aggregation:) <total> { fn:sum($receipts) } </total> </sales>
XQuery Example Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 • For each author, return number of books and receipts • for books published in the past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml”), Documents $sales := fn:doc(“www.publishersweekly.com/sales.xml”) for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2003 and author = $a], Sel. $receipts := $sales/book[@isbn = $books/@isbn]/receipts Join order by $author Ordering return <sales> XML Construction { $author } <count> { fn:count($books) } </count> Aggregation <total> { fn:sum($receipts) } </total> </sales>
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery Systems: 2 Approaches • Existing “native” XML/XQuery systems are built from scratch • Galax, Saxon, … • X-Hive, Tamino, … • (Still have to) re-invent optimization technology • Our approach: • Build XQuery system on top of an RDBMS • Leverage mature relational technology to achieve efficient XQuery processing
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Architecture
f/descendant: SELECT * FROM pre_post WHERE pre > f.pre AND post < f.post f/ancester: SELECT * FROM pre_post WHERE pre < f.pre AND post > f.post f/preceeding: SELECT * FROM pre_post WHERE pre < f.pre AND post < f.post f/following: SELECT * FROM pre_post WHERE pre > f.pre AND post > f.post Node-based relational encoding of XQuery's data model Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XML in an RDBMS: XPath Accelerator
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Science & Research • More research lead to more optimization • Join Recognition • Embedded XPath processing • Order Awareness • Various scientific publications
Unsurpassed scalability • Standard Opteron PC, 8GB RAM, 64-bit Linux • Can process 11GB documents! Mostly linear scaling with document size Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Results: Scalability (3)
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Conclusions • Relational approach • Works • Is fast • Is scalable • Crucial Optimizations • Join recognition • Embedded XPath processing • Order awareness
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Roadmap • 30-05-05: MonetDB/XQuery 4.8/0.8 “Mercurius” • Developers Release / Technology Preview 1 • 30-09-05: MonetDB/XQuery 4.10/0.10 “Venus” • Student Release / Technology Preview 2 • XUpdate, Algebraic Query Optimization • 30-12-05: MonetDB/XQuery 4.12/1.12 “Mars” • Final Release • Application Programming Interfaces • End-User Front-Ends
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Open Source Release • MonetDB + Pathfinder on SourceForge • Mozilla-like License • MonetDB homepage • http://monetdb.cwi.nl/ • Pathfinder homepage • http://pathfinder-xquery.org/ • Developers website • http://sf.net/projects/monetdb/
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XML • Standard, flexible syntax for data exchange • Regular, structured data Database content of all kinds: Inventory, billing, orders, … “Small” typed values • Irregular, unstructured text Documents of all kinds: Transcripts, books, legal briefs, … “Large” untyped values • Lingua franca of B2B Applications… • Increase access to products & services • Integrate disparate data sources • Automate business processes • … and numerous other application domains • Bio-informatics, library science, …
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XML : A First Look • XML document describing catalog of books <?xml version="1.0" encoding="ISO-8859-1" ?> <catalog> <book isbn="ISBN 1565114302"> <title>No Such Thing as a Bad Day</title> <author>Hamilton Jordan</author> <publisher>Longstreet Press, Inc.</publisher> <price currency="USD">17.60</price> <review> <reviewer>Publisher</reviewer>: This book is the moving account of one man's successful battles against three cancers ...<title>No Such Thing as a Bad Day</title> is warmly recommended. </review> </book> <!-- more books and specifications --> </catalog>
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery 1.0 • Functional, strongly-typed query language • XQuery 1.0 = XPath 2.0 for navigation, selection, extraction + A few more expressions For-Let-Where-Order By-Return (FLWOR) XML construction Operators on types + User-defined functions & modules + Strong typing
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XSLT vs. XQuery • XSLT 1.0: XML XML, HTML, Text • Loosely-typed scripting language • Format XML in HTML for display in browser • Must be highly tolerant of variability/errors in data • XQuery 1.0: XML XML • Strongly-typed query language • Large-scale database access • Must guarantee safety/correctness of operations on data • Over time, XSLT & XQuery may both serve needs of many application domains • XQuery will become a hidden, commodity language
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery Example • For each author, return number of books and receipts books published in past 2 years, ordered by name let $cat := fn:doc(“www.bn.com/catalog.xml“), Join $sales := fn:doc(“www.publishersweekly.com/sales.xml“) for $author in distinct-values($cat//author) Grouping let $books := $cat//book[@year >= 2000 and author = $a], S.J. $receipts := $sales/book[@isbn = $books/@isbn]/receipts order by $author Ordering return <sales> XML Construction { $author } <count> { fn:count($books) } </count> Aggregation <total> { fn:sum($receipts) } </total> </sales>
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XQuery Systems: 2 Approaches • Tree-based • Tree is basic data structure • Also on disk (if an XQuery DBMS) • Navigational Approach • Galax [Simeon..], Flux [Koch..], X-Hive • Tree Algebra Approach • TIMBER [Jagadish..] • Relational • Data shredded in relational tables • XQuery translated into database query (e.g. SQL)
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 The Pathfinder Project • Challenge / Goal: • Turn RDBMSs into efficient XQuery engines • People: • Torsten Grust, Jens Teubner • University of Konstanz (June 2005: Technical University of Munich) • Maurice van Keulen • University of Twente • Jan Rittinger • University of Konstanz & CWI • Task: generate code for MonetDB
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 The Pathfinder Project • Challenge / Goal: • Turn RDBMSs into efficient XQuery engines • People: • Torsten Grust, Jens Teubner, ... • University of Konstanz (June 2005: Technical University of Munich) • Maurice van Keulen, Jan Flokstra, ... • University of Twente • Jan Rittinger • University of Konstanz & CWI • Task: generate code for MonetDB • Peter Boncz, Stefan Manegold, Sjoerd Mullender, ... • CWI, Amsterdam
MonetDB: Applied CS Research at CWI Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 • a decade of “query-intensive” application experience • image retrieval: Peter Bosch ImageSpotter • audio/video retrieval: Alex van Ballegooij RAM • XML text retrieval: de Vries / Hiemstra TIJAH • biological sequences: Arno Siebes BRICKS • XML databases: Albrecht Schmidt XMark Grust / vKeulen Pathfinder • GIS: Wilco Quak MAGNUM • data warehousing / OLAP / data mining SPSS DataDistilleries Univ. Massachussetts PROXIMITY CWI research group successfully spun off DataDistilleries (now SPSS)
Pathfinder Parser Parser Sem. Analysis Sem. Analysis Core Translation Core Translation SQL Typechecking Typechecking MIL MIL Relational Algebra Database Database MonetDB Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Pathfinder — MonetDB
Pathfinder Parser Parser Sem. Analysis Sem. Analysis Core Translation Core Translation SQL Core to MILTranslation Typechecking Typechecking MIL MIL Relational Algebra Database Database MonetDB Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Pathfinder — MonetDB
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Open Source • MonetDB + Pathfinder on Sourceforge • Mozilla License • MonetDB Homepage • http://monetdb.cwi.nl/ • Pathfinder Homepage • http://pathfinder-xquery.org/ • Developers website: • http://sf.net/projects/monetdb/ RoadMap • 14-apr-04: initial Beta release MonetDB/SQL • 30-sep-04: first official release MonetDB/SQL • 30-may-05: Developer release of MonetDB/XQuery (i.e. Pathfinder) • 30-sep-05: Student release of MonetDB/XQuery (incl. XUpdate) • 30-dec-05: Users release of MonetDB/XQuery (?)
MonetDB: extensible architecture Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Front-end/back-end: • support multiple data models • support multiple end-user languages • support diverse application domains
MonetDB: extensible architecture Pathfinder XQuery Frontend Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Front-end/back-end: • support multiple data models • support multiple end-user languages • support diverse application domains
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Architecture
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 The Architecture
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps
Node-based relational encoding of XQuery's data model Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 XPath on an RDBMS
done for better skipping and updates Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Pre/Post Pre/Level/Size
Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Outline • Basic XML / XQuery • Introduction of Pathfinder and MonetDB projects • Relational XQuery • XPath steps in the pre/post plane • Translating for-loops, and beyond • Optimizations • Order prevention • Loop-Lifted Staircase join • Join recognition • Outlook • Conclusions • Roadmaps
[VLDB03 Wang&Cherniack] define: • Order properties of relations • Order propagation rules for relational operators Decoration of physical plans with order properties eliminate sort Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Order Prevention
for $p in $auction/site/people/person for $t in $auction/site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t • For loop map with all combinations O(N*N) • If `simple’ condition exist on two loop variables join • Only make a map with the matching combinations • E.g. with Hash-Table O(N) Performed on the XCore tree Recognize if-then expressions Open question: where to optimize best?? Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Join Recognition
Staircase join [VLDB03]: • Single-pass for a *set* of context nodes Loop-lifting multiple iters multiple sets of context nodes • elaborate skipping! • Loop-Lifted Staircase Join In a single pass: process multiple input context node lists • Use a stack • Exploit axis properties for pruning Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Loop-lifted staircase join
Test platform • Opteron 1.6GHz, 8GB RAM, 64-bit Linux (Fedora Core 3) • Can process 11GB document! Mostly linear scaling with document size Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Scalability
Test platform • Opteron 1.6GHz, 8GB RAM, 64-bit Linux (Fedora Core 3) • Can process 11GB document! Mostly linear scaling with document size • Some swapping in the join queries Stefan Manegold MonetDB/XQuery HollandOpen, Amsterdam 31-5-2005 Scalability