1 / 117

MonetDB, a Column-Store in Midflight

MonetDB, a Column-Store in Midflight. Martin Kersten CWI Amsterdam. Where is the field heading to. Mike Stonebraker, VLDB 2007: One size fits all: A concept whose time has come and gone Martin Kersten, ICDE 2008 : Mike is wrong….. Every size fits him always… Even a Euro solution.

sabin
Download Presentation

MonetDB, a Column-Store in Midflight

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MonetDB, a Column-Store in Midflight Martin Kersten CWI Amsterdam

  2. Where is the field heading to Mike Stonebraker, VLDB 2007: One size fits all: A concept whose time has come and gone Martin Kersten, ICDE 2008: Mike is wrong….. Every size fits him always… Even a Euro solution.

  3. If you want to play with a generic column-store, we recommend you download the academic version of Vertica, the commercialization of the C-Store project, or download MonetDB, an open-source column-store.

  4. Present Cracking Potency Paste Columns Chaos

  5. Present Potency PAX stores Paste N-ary stores Columnstores

  6. John Mary 32 31 Houston Houston • Try to keep things simple Early 80s: tuple storage structures for PCs were simple OK OK Easy to access at the cost of wasted space

  7. Try to keep things simple Slotted pages Logical pages equated physical pages 32 John Houston 31 Mary Houston

  8. Try to keep things simple Slotted pages Logical pages equated multiple physical pages 32 John Houston 31 Mary Houston

  9. Avoid things you don’t always need Not all attributes are equally important

  10. Avoid moving too much around A column orientation is as simple and acts like an array Attributes of a tuple are correlated by offset

  11. Try to keep things simple • MonetDB Binary Association Tables

  12. Try to avoid doing things twice Physical data organization • Binary Association Tables Dense sequence Memory mapped files Bat Unit fixed size

  13. Hash-based access • Try to avoid doing things twice • Binary Association Tables accelerators Column properties: key-ness non-null dense ordered

  14. Type remappings are used to squeeze space A VID datatype can be used to represent dense enumerations • Try to avoid doing things twice • Binary Association Tables storage control 100 A BAT can be used as an encoding table

  15. Mantra: Try to keep things simple • Column orientation benefits datawarehousing • Brings a much tighter packaging and improves transport through the memory hierarchy • Each column can be more easily optimized for storage using compression schemes • Each column can be replicated for read-only access

  16. Try to maximize performance Present Materialize All Model Potency Paste Volcanomodel Vectorizedmodel

  17. Try to maximize performance Volcano Refresher Query SELECT name, salary*.19 AS tax FROM employee WHERE age > 25

  18. Try to maximize performance Volcano Refresher • Operators • Iterator interface • open() • next(): tuple • close()

  19. Try to maximize performance Volcano paradigm • The Volcano model is based on a simple pull-based iterator model for programming relational operators. • The Volcano model minimizes the amount of intermediate store • The Volcano model is CPU intensive and inefficient

  20. Try to use simple a software pattern MonetDB paradigm • The MonetDB kernel is a programmable relational algebra machine • Relational operators operate on ‘array’-like structures • Based on experiences in database machines 25 years ago, RAP, CASSM,…ICL RAP… PRISMA… IDIOMS (J. Kerridge)…

  21. select count(*) from photoobjall; SQL XQuery function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; MonetDB Kernel • Try to use simple a software pattern MAL MonetDB Server

  22. Try to use simple a software pattern Operator implementation • All algebraic operators materialize their result • Local optimization decisions • Heavy use of code expansion to reduce cost • 55 selection routines • 149 unary operations • 335 join/group operations • 134 multi-join operations • 72 aggregate operations

  23. select * into tmp from tapestry where attr1>=0 and attr1 <=@range create table tmp( attr0 int, attr1 int); insert into tmp select * from tapestry where attr1>=0 and attr1 <=@range; Micro-benchmark • Keeping the query result in a new table is often too expensive In milliseconds/10K Fixed cost in ms

  24. commercial Multi-column tapestry ms MonetDB/SQL #joins Experiments ran on Athlon 1.4, Linux

  25. A column store should be designed from scratch to benefit from its characteristics • Simulation of a column store on top of an n-ary system using the Volcano model does not work

  26. Try to maximize performance Present Queryoptimizer Potency Paste Execution Paradigm DatabaseStructures

  27. Try to avoid the search space trap • Applications have different characteristics • Platforms have different characteristics • The actual state of computation is crucial • A generic all-encompassing optimizer cost-model does not work

  28. SQL XQuery MonetDB Kernel • Try to disambiguate decisions MAL MAL • Operational optimizer: • Exploit everything you know at runtime • Re-organize if necessary MonetDB Server

  29. SQL XQuery MonetDB Kernel • Try to disambiguate decisions • Strategic optimizer: • Exploit the semantics of the language • Rely on heuristics MAL MAL • Operational optimizer: • Exploit everything you know at runtime • Re-organize if necessary MonetDB Server

  30. SQL XQuery x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0); x14:= algebra.uselect(x1,A0,A1); y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra"); y2 := bpm.new(:oid,:oid); barrier rs:= bpm.newIterator(y1,A0,A1); t1:= algebra.uselect(rs,A0,A1); bpm.addSegment(y2,t1); redo rs:= bpm.hasMoreElements(y1,A0,A1); exit rs; MonetDB Kernel • Try to disambiguate decisions • Tactical MAL optimizer: • No changes in front-ends and no direct human guidance • Minimal changes in the engine MAL Tactical Optimizer MAL MonetDB Server

  31. Code Inliner. Constant Expression Evaluator. Accumulator Evaluations. Strength Reduction. Common Term Optimizer. Join Path Optimizer. Ranges Propagation. Operator Cost Reduction. Foreign Key handling. Aggregate Groups. Code Parallizer. Replication Manager. Result Recycler. MAL Compiler. Dynamic Query Scheduler. Memo-based Execution. Vector Execution. Alias Removal. Dead Code Removal. Garbage Collector. • Try to disambiguate decisions

  32. Try to maximize performance Present Queryoptimizer Potency Paste Execution Paradigm DatabaseStructures

  33. Execution paradigms • No data from persistent store to the memory trash • The MonetDB kernel is set up to accommodate different execution engines • The MonetDB assembler program is • Interpreted in the order presented • Interpreted in a dataflow driven manner • Compiled into a C program • Vectorised processing • X100 project

  34. MonetDB/x100 Combine Volcano model withvector processing. All vectors together should fit the CPU cache Vectors are compressed Optimizer should tune this, given the query characteristics. X100 query engine CPU cache ColumnBM (buffer manager) RAM networked ColumnBM-s

  35. No data from persistent store to the memory trash • Varying the vector size on TPC-H query 1 mysql, oracle, db2 low IPC, overhead MonetDB RAM bandwidth bound X100

  36. No data from persistent store to the memory trash • Vectorized-Volcano processing can be used for both multi-core and distributed processing • The architecture and the parameters are influenced heavily by • Hardware characteristics • Data distribution to compress columns

  37. The proof of the pudding is in the eating Does MonetDB stand a ‘real’ test? Is the main memory orientation a bottleneck? Is it functionally complete?

  38. TPC-H TPC-H 60K rows line_item table Comfortably fit in memory Performance in milliseconds ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

  39. TPC-H Scale-factor 1 6M row line-item table Out of the box performance Queries produce emptyor erroneous results ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

  40. TPC-H ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

  41. TPC-H ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory

  42. Code base for MonetDB/SQL is 1.2M lines of C • Nightly regression testing on 17 platforms

  43. Try to maximize performance Present Materialized Views Potency Paste Cracking B-tree, Hash Indices

  44. Find a trusted fortune teller • Indices in database systems focus on: • All tuples are equally important for fast retrieval • There are ample resources to maintain indices • MonetDB cracks the database into pieces based on actual query load

  45. Cracking algorithms Physical reorganization happens per column based on selection predicates. Split a piece of a column intwonew pieces A<10 A<10 A>=10

  46. Cracking algorithms Physical reorganization happens per column Split a piece of a column intwonew pieces Split a piece of a column inthree new pieces A<5 A<10 A<10 5<A<10 5<A<10 A>=10 A>=10

  47. Cracking example select A>5 and A<10 17 3 8 6 2 12 13 4 15

  48. Cracking example select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12

  49. Cracking example select A>5 and A<10 >=10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12 >=10

  50. Cracking example select A>5 and A<10 >=10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12

More Related