C-Store: Tuple Reconstruction

C-Store: Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Mar 27, 2009

Motivation • In a Column-Oriented DBMS, columns are stored separately • Separate column values of the same logical tuple must be stitched together when the tuple is finally returned to a user.

How to Identify Column Values of the Same Logical Tuple? • Attach either physical or virtual tuple ID or positions to column values. • In the Read Store of C-Store, a Storage Key is equal to a position in a column. • In the Write Store of C-Store, a Storage Key is physically stored as a tuple ID. • Tuple Reconstruction is easy if columns are sorted in the same order • Join on the positions instead of on the physical tuple ID.

Two Strategies of Tuple Reconstruction • Early Materialization (EM) • Whenever a column C1 is accessed, add C1 (concrete column values) to an intermediate tuple representation • if C1 is needed by some later operator, • or if C1 is one of the output columns. • Late Materialization (LM) • Construct tuples as late as possible.

Tuple Reconstruction: An Example (1) • Assume a relation R has 3 columns • R.a, R.b, R.c • All the 3 columns are sorted in the same order, • and are stored in separate files. • Suppose a query consists of 3 selection predicates • σ1, σ2, σ3 over R.a, R.b, R.c respectively • σ1 is the most selective predicate • σ3 is the least selective predicate

Tuple Reconstruction : An Example (2) • An early materialization strategy could process the query as follows: • Read in a block of R.a, a block of R.b, and a block of R.c from disk. • Stitch them together into block(s) of triples (R.a, R.b, R.c ). • Apply σ1, σ2, σ3 in turn, allowing tuples that match the predicates to pass through.

Tuple Reconstruction : An Example (3) • A late materialization strategy could process the query as follows: • First scan R.a, and output the positions in R.a that satisfy σ1. • Second scan R.b, and output the positions in R.b that satisfy σ2. • Third scan R.c, and output the positions in R.c that satisfy σ3. • Fourth use position-wise AND to find the intersection of the 3 position lists. • Finally re-scan R.a, R.b, and R.c , and extract the values of the records whose positions are in the intersection, and stitch these values together into output tuples.

Late Materialization: Potential Pros and Cons + Operating directly on position lists + Constructing only relevant tuples. - re-scanning the base columns to form tuples.

Early Materialization Advantages • No need to re-scan a column. • If the re-scanning cost at tuple reconstruction time is high, early materialization gets bonus.

An Analytical Model for Comparing the Two Materialization Strategies • The model is composed of 3 types of operators: • Data Source (DS) operator • AND operator • Tuple Construction operator • These operators are enough for expressing simple queries using each materialization strategy.

Data Source (DS) operator: Case 1 • Input • A column Ci of | Ci | blocks from disk. • A predicate with selectivity SF. • Ouput • A column of positions of the tuples that satisfy the predicate. • Used by late materialization.

Data Source (DS) operator: Case 2 • Input • A column Ci of | Ci | blocks from disk. • A predicate with selectivity SF. • Ouput • A column of (position, value) pairs of the tuples that satisfy the predicate. • Used by early materialization.

Data Source (DS) operator: Case 3 • Input • A column Ci of | Ci | blocks from disk or memory. • A list of positions, i.e., POSLIST. • Ouput • A column of the values corresponding to the positions in POSLIST. • Used by late materialization.

Data Source (DS) operator: Case 4 • Input • A column Ci of | Ci | blocks from disk. • A predicate with selectivity SF. • A set of intermediate tuples of the form (pos, <a1, ..., an>). • Ouput • A set of intermediate tuples of the new form (pos, <a1, ..., an, , an+1), i,e., adding column Ci to tuples. • Used by early materialization.

The AND Operator • Input: • k position lists, inpos1,...,inposk. • Output: • outpos: a new list of positions representing the inetersection of those input lists. • Operating onpositions is fast.

Tuple Construction Operators • The MERGE operator • input: k sets of values VAL1,...,VALk. • output: a set of k-ary tuples. • This operator is used to construct tuples at the top of a late materialization plan. • The SPC(Scan, Predicate, and Construct) operator • input: • k columns VAL1,...,VALk from disk; • a set of predicates. • output: a set of tuples that pass all predicates. • This operator can sit at the bottom of an early materialization plan.

Example Query Plans: EM

Example Query Plans: LM

Optimization in Late Materialization • Data Source Case 3: produce values from positions • Input • A column Ci of | Ci | blocks from disk or memory. • A list of positions, i.e., POSLIST. • Ouput • A column of the values corresponding to the positions in POSLIST. • Optimization • If the column is in memory, do not read it from disk. • i.e., reduce the cost of re-scanning a column.

LM Optimization: Multi-Columns • A Multi-Column is a specialized data structure • allows blocks of column data to remain in memory after the first scan so that those blocks can be easily scanned again later on. • Contains a memory-resident, horizontal partition of some subset of columns from a logical relation.

Components of a Multi-Column • A covering position range: • Indicates the virtual start position and end position of the horizontal partition • An array of mini-columns: • A mini-column is the set of corresponding values for a specified position range of a column. • Each mini-column is kept compressed the same way as it was on disk. • A position descriptor: • Indicates which positions in the position range remain valid.

Construction of a Multi-Column • Initially a multi-column contains only one mini-column. • When a page of a column is read from disk, a mini-column is created with a position descriptor indicating that all positions are valid. • Each mini-column can be just a pointer to the page in the buffer. • A modified AND operator is used to merge two multi-columns into a wider multi-column.

The Use of a Multi-Column • If a DS Case 3 operator takes as input a multi-column rather than just a position list, • then it has no need to re-scan the column (from disk).

Predicated vs. Actual Behavior

Heuristic for Choosing Materialization Strategy • Use Late Materialization • If a query contains aggregation, • or if the selectivity of predicates in the query is small. • Use Early Materialization • in contrast to the conditions for late materialization.

References • Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS , VLDB, 2005. • Daniel J. Abadi, Daniel S. Myers, David J. DeWitt, and Samuel R. Madden。 Materialization Strategies in a Column-Oriented DBMS . Proceedings of ICDE, April, 2007, Istanbul, Turkey.

C-Store: Tuple Reconstruction