C-Store: A Column-oriented DBMS

C-Store: A Column-oriented DBMS By New England Database Group

Current DBMS Gold Standard • Store fields in one record contiguously on disk • Use B-tree indexing • Use small (e.g. 4K) disk blocks • Align fields on byte or word boundaries • Conventional (row-oriented) query optimizer and executor (technology from 1979) • Aries-style transactions

Terminology -- “Row Store” Record 1 Record 2 Record 3 Record 4 E.g. DB2, Oracle, Sybase, SQLServer, …

Row Stores are Write Optimized • Can insert and delete a record in one physical write • Good for on-line transaction processing (OLTP) • But not for read mostly applications • Data warehouses • CRM

Elephants Have Extended Row Stores • With Bitmap indices • Better sequential read • Integration of “datacube” products • Materialized views But there may be a better idea…….

Column Stores

At 100K Feet…. • Ad-hoc queries read 2 columns out of 20 • In a very large warehouse, Fact table is rarely clustered correctly • Column store reads 10% of what a row store reads

C-Store (Column Store) Project • Brandeis/Brown/MIT/UMass-Boston project • Usual suspects participating • Enough coded to get performance numbers for some queries • Complete status later

We Build on Previous Pioneering Work…. • Sybase IQ (early ’90s) • Monet (see CIDR ’05 for the most recent description)

C-Store Technical Ideas • Code the columns to save space • No alignment • Big disk blocks • Only materialized views (perhaps many) • Focus on Sorting not indexing • Automatic physical DBMS design

C-store (Column Store) Technical Ideas • Optimize for grid computing • Innovative redundancy • Xacts – but no need for Mohan • Data ordered on anything, Not just time • Column optimizer and executor

How to Evaluate This Paper…. • None of the ideas in isolation merit publication • Judge the complete system by its (hopefully intelligent) choice of • Small collection of inter-related powerful ideas • That together put performance in a new sandbox

Code the Columns • Work hard to shrink space • Use extra space for multiple orders • Fundamentally easier than in a row store • E.g. RLE works well

No Alignment • Densepack columns • E.g. a 5 bit field takes 5 bits • Current CPU speed going up faster than disk bandwidth • Faster to shift data in CPU than to waste disk bandwidth

Big Disk Blocks • Tunable • Big (minimum size is 64K)

Only Materialized Views • Projection (materializedview) is some number of columns from a fact table • Plus columns in a dimension table – with a 1-n join between Fact and Dimension table • Stored in order of a storage key(s) • Several may be stored!!!!! • With a permutation, if necessary, to map between them

Only Materialized Views • Table (as the user specified it and sees it) is not stored! • No secondary indexes (they are a one column sorted MV plus a permutation, if you really want one)

Example User view: EMP (name, age, salary, dept) Dept (dname, floor) Possible set of MVs: MV-1 (name, dept, floor) in floor order MV-2 (salary, age) in age order MV-3 (dname, salary, name) in salary order

Different Indexing

Automatic Physical DBMS Design • Not enough 4-star wizards to go around • Accept a “training set” of queries and a space budget • Choose the MVs auto-magically • Re-optimize periodically based on a log of the interactions

Optimize for Grid Computing • I.e. shared-nothing • Dewitt (Gamma) was right • Horizontal partitioning and intra-query parallelism as in Gamma

Innovative Redundancy • Hardly any warehouse is recovered by a redo from the log • Takes too long! • Store enough MVs at enough places to ensure K-safety • Rebuild dead objects from elsewhere in the network • K-safety is a DBMS-design problem!

XACTS – No Mohan • Undo from a log (that does not need to be persistent) • Redo by rebuild from elsewhere in the network

XACTS – No Mohan • Snapshot isolation (run queries as of a tunable time in the recent past) • To solve read-write conflicts • Distributed Xacts • Without a prepare message (no 2 phase commit)

Storage (sort) Key(s) is not Necessarily Time • That would be too limiting • So how to do fast updates to densepack column storage that is not in entry sequence?

Solution – a Hybrid Store Write-optimized Column store (Much like Monet) (Batch rebuilder) Tuple mover Read-optimized Column store (What we have been talking about so far)

Column Executor • Column operations – not row operations • Columns remain coded – if possible • Late materialization of columns

Column Optimizer • Chooses MVs on which to run the query • Most important task • Build in snowflake schemas • Which are simple to optimize without exhaustive search • Looking at extensions

Current Performance • 100X popular row store in 40% of the space • 10X popular column store in 70% of the space • 7X popular row store in 1/6th of the space • Code available with BSD license

Structure Going Forward • Vertica • Very well financed start-up to commercialize C-store • Doing the heavy lifting • University Research • Funded by Vertica

Vertica • Complete alpha system in December ‘05 • Everything, including DBMS designer • With current performance! • Looking for early customers to work with (see me if you are interested)

University Research • Extension of algorithms to non-snowflake schemas • Study of L2 cache performance • Study of coding strategies • Study of executor options • Study of recovery tactics • Non-cursor interface • Study of optimizer primitives

C-Store: A Column-oriented DBMS