1.2k likes | 1.37k Views
MonetDB, a Column-Store in Midflight. Martin Kersten CWI Amsterdam. Where is the field heading to. Mike Stonebraker, VLDB 2007: One size fits all: A concept whose time has come and gone Martin Kersten, ICDE 2008 : Mike is wrong….. Every size fits him always… Even a Euro solution.
E N D
MonetDB, a Column-Store in Midflight Martin Kersten CWI Amsterdam
Where is the field heading to Mike Stonebraker, VLDB 2007: One size fits all: A concept whose time has come and gone Martin Kersten, ICDE 2008: Mike is wrong….. Every size fits him always… Even a Euro solution.
If you want to play with a generic column-store, we recommend you download the academic version of Vertica, the commercialization of the C-Store project, or download MonetDB, an open-source column-store.
Present Cracking Potency Paste Columns Chaos
Present Potency PAX stores Paste N-ary stores Columnstores
John Mary 32 31 Houston Houston • Try to keep things simple Early 80s: tuple storage structures for PCs were simple OK OK Easy to access at the cost of wasted space
Try to keep things simple Slotted pages Logical pages equated physical pages 32 John Houston 31 Mary Houston
Try to keep things simple Slotted pages Logical pages equated multiple physical pages 32 John Houston 31 Mary Houston
Avoid things you don’t always need Not all attributes are equally important
Avoid moving too much around A column orientation is as simple and acts like an array Attributes of a tuple are correlated by offset
Try to keep things simple • MonetDB Binary Association Tables
Try to avoid doing things twice Physical data organization • Binary Association Tables Dense sequence Memory mapped files Bat Unit fixed size
Hash-based access • Try to avoid doing things twice • Binary Association Tables accelerators Column properties: key-ness non-null dense ordered
Type remappings are used to squeeze space A VID datatype can be used to represent dense enumerations • Try to avoid doing things twice • Binary Association Tables storage control 100 A BAT can be used as an encoding table
Mantra: Try to keep things simple • Column orientation benefits datawarehousing • Brings a much tighter packaging and improves transport through the memory hierarchy • Each column can be more easily optimized for storage using compression schemes • Each column can be replicated for read-only access
Try to maximize performance Present Materialize All Model Potency Paste Volcanomodel Vectorizedmodel
Try to maximize performance Volcano Refresher Query SELECT name, salary*.19 AS tax FROM employee WHERE age > 25
Try to maximize performance Volcano Refresher • Operators • Iterator interface • open() • next(): tuple • close()
Try to maximize performance Volcano paradigm • The Volcano model is based on a simple pull-based iterator model for programming relational operators. • The Volcano model minimizes the amount of intermediate store • The Volcano model is CPU intensive and inefficient
Try to use simple a software pattern MonetDB paradigm • The MonetDB kernel is a programmable relational algebra machine • Relational operators operate on ‘array’-like structures • Based on experiences in database machines 25 years ago, RAP, CASSM,…ICL RAP… PRISMA… IDIOMS (J. Kerridge)…
select count(*) from photoobjall; SQL XQuery function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X16 := calc.oid(0@0); X18 := algebra.markT(X15,X16); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; MonetDB Kernel • Try to use simple a software pattern MAL MonetDB Server
Try to use simple a software pattern Operator implementation • All algebraic operators materialize their result • Local optimization decisions • Heavy use of code expansion to reduce cost • 55 selection routines • 149 unary operations • 335 join/group operations • 134 multi-join operations • 72 aggregate operations
select * into tmp from tapestry where attr1>=0 and attr1 <=@range create table tmp( attr0 int, attr1 int); insert into tmp select * from tapestry where attr1>=0 and attr1 <=@range; Micro-benchmark • Keeping the query result in a new table is often too expensive In milliseconds/10K Fixed cost in ms
commercial Multi-column tapestry ms MonetDB/SQL #joins Experiments ran on Athlon 1.4, Linux
A column store should be designed from scratch to benefit from its characteristics • Simulation of a column store on top of an n-ary system using the Volcano model does not work
Try to maximize performance Present Queryoptimizer Potency Paste Execution Paradigm DatabaseStructures
Try to avoid the search space trap • Applications have different characteristics • Platforms have different characteristics • The actual state of computation is crucial • A generic all-encompassing optimizer cost-model does not work
SQL XQuery MonetDB Kernel • Try to disambiguate decisions MAL MAL • Operational optimizer: • Exploit everything you know at runtime • Re-organize if necessary MonetDB Server
SQL XQuery MonetDB Kernel • Try to disambiguate decisions • Strategic optimizer: • Exploit the semantics of the language • Rely on heuristics MAL MAL • Operational optimizer: • Exploit everything you know at runtime • Re-organize if necessary MonetDB Server
SQL XQuery x1:bat[:oid,:dbl]:= sql.bind("sys","photoobjall","ra",0); x14:= algebra.uselect(x1,A0,A1); y1:bat[:oid,:dbl]:= bpm.take("sys_photoobjall_ra"); y2 := bpm.new(:oid,:oid); barrier rs:= bpm.newIterator(y1,A0,A1); t1:= algebra.uselect(rs,A0,A1); bpm.addSegment(y2,t1); redo rs:= bpm.hasMoreElements(y1,A0,A1); exit rs; MonetDB Kernel • Try to disambiguate decisions • Tactical MAL optimizer: • No changes in front-ends and no direct human guidance • Minimal changes in the engine MAL Tactical Optimizer MAL MonetDB Server
Code Inliner. Constant Expression Evaluator. Accumulator Evaluations. Strength Reduction. Common Term Optimizer. Join Path Optimizer. Ranges Propagation. Operator Cost Reduction. Foreign Key handling. Aggregate Groups. Code Parallizer. Replication Manager. Result Recycler. MAL Compiler. Dynamic Query Scheduler. Memo-based Execution. Vector Execution. Alias Removal. Dead Code Removal. Garbage Collector. • Try to disambiguate decisions
Try to maximize performance Present Queryoptimizer Potency Paste Execution Paradigm DatabaseStructures
Execution paradigms • No data from persistent store to the memory trash • The MonetDB kernel is set up to accommodate different execution engines • The MonetDB assembler program is • Interpreted in the order presented • Interpreted in a dataflow driven manner • Compiled into a C program • Vectorised processing • X100 project
MonetDB/x100 Combine Volcano model withvector processing. All vectors together should fit the CPU cache Vectors are compressed Optimizer should tune this, given the query characteristics. X100 query engine CPU cache ColumnBM (buffer manager) RAM networked ColumnBM-s
No data from persistent store to the memory trash • Varying the vector size on TPC-H query 1 mysql, oracle, db2 low IPC, overhead MonetDB RAM bandwidth bound X100
No data from persistent store to the memory trash • Vectorized-Volcano processing can be used for both multi-core and distributed processing • The architecture and the parameters are influenced heavily by • Hardware characteristics • Data distribution to compress columns
The proof of the pudding is in the eating Does MonetDB stand a ‘real’ test? Is the main memory orientation a bottleneck? Is it functionally complete?
TPC-H TPC-H 60K rows line_item table Comfortably fit in memory Performance in milliseconds ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory
TPC-H Scale-factor 1 6M row line-item table Out of the box performance Queries produce emptyor erroneous results ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory
TPC-H ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory
TPC-H ATHLON X2 3800+ (2000mhz) 2 disks in raid 0, 2G main memory
Code base for MonetDB/SQL is 1.2M lines of C • Nightly regression testing on 17 platforms
Try to maximize performance Present Materialized Views Potency Paste Cracking B-tree, Hash Indices
Find a trusted fortune teller • Indices in database systems focus on: • All tuples are equally important for fast retrieval • There are ample resources to maintain indices • MonetDB cracks the database into pieces based on actual query load
Cracking algorithms Physical reorganization happens per column based on selection predicates. Split a piece of a column intwonew pieces A<10 A<10 A>=10
Cracking algorithms Physical reorganization happens per column Split a piece of a column intwonew pieces Split a piece of a column inthree new pieces A<5 A<10 A<10 5<A<10 5<A<10 A>=10 A>=10
Cracking example select A>5 and A<10 17 3 8 6 2 12 13 4 15
Cracking example select A>5 and A<10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12
Cracking example select A>5 and A<10 >=10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12 >=10
Cracking example select A>5 and A<10 >=10 17 17 3 3 8 8 6 6 2 2 15 15 13 13 4 4 12 12