190 likes | 324 Views
C-Store: Column-Oriented Data Warehousing. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010. C-Store ’ s Father: Michael Stonebraker. A former Professor at Berkeley, an Adjunct Professor at M.I.T. ACM Software System Award , 1988 INGRES, developed by undergraduates
E N D
C-Store: Column-Oriented Data Warehousing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010
C-Store’s Father:Michael Stonebraker • A former Professor at Berkeley, • an Adjunct Professor at M.I.T. • ACM Software System Award, 1988 • INGRES, developed by undergraduates • POSTGRES, Mariposa, C-Store • ACM SIGMOD Innovation Award, 1994 • National Academy of Engineering , 1998
C-Store: The Home Pagehttp://db.lcs.mit.edu/projects/cstore/ • C-Store: A Column-Oriented DBMS • download-Source code • overview-Project description • papers-Publications • people-Who are we? • The CStore project is a collaboration between MIT, Yale, Brandeis University. Brown University, and UMass Boston . • Commercialized C-Store: Vertica
The Starting Point • C-Store: A Column Oriented DBMS • Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. • VLDB, pages 553-564, 2005.
C-Store: the Column Store Project • Row Store or Column Store ? Column 1 Column 2 Column 3 Record 1 Record 2 Record 3 Relation or Tables
The History: Relational Model • Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM13 (6): 377–387. • Physical Data Independence • Row Store Vs. Column Store on the same Conceptual Model: Relation
Row Store: Why? • OLTP (On-Line Transaction Processing) • ATM, POS in supermarkets • Characteristics of OLTP applications : • Transactions that involve small numbers of records (or tuples) • Frequent updates (including queries) • Many users • Fast response times • OLTP Needs Write-Optimized Row Store. • Insert and delete a record in one physical write.
Row Store: Columns Stored Together Data Rid = (i,N) Page i • Record id = <page id, slot #> Rid = (i,2) Rid = (i,1) N Pointer to start of free space 16 24 20 N . . . 2 1 # slots Slot Array SLOT DIRECTORY
Current DBMS Gold Standard • Store Columns in one record contiguously on disk • Use B-tree indexing • Use small (e.g. 4K) disk blocks • Align fields on byte or word boundaries • Conventional (row-oriented) query optimizer and executor (technology from 1979) • Aries-style transactions
From OLTP to OLAP and Data Warehouse • OLAP (On-Line Analytical Processing, Codd, 1993) • Flexible Reporting for Business Intelligence • Characteristics of OLAP applications : • Transactions that involve large numbers of records • Frequent Ad-hoc queries and Infrequent updates • A few decision making users • Fast response times • Data warehouses are designed to facilitate reporting and analysis. • Read-Mostly
Other Read-Mostly Applications • CRM (Customer Relationship Management ) • Siebel (Oracle) • Catalog Search in Electronic Commerce • Amazon.com • Shopping.com
Column Store: Why? • The Intuition: Only read relevant columns • Say, Ad-hoc queries read 2 columns out of 20 • Column Store is not a new idea • Sybase IQ (early ’90s, bitmap index) • Addamark (i.e., SenSage, for Event Log data warehouse) • MonetDB (Hyper-Pipelining Query Execution, CIDR’05)
C-Store Technical Ideas • Logical Data Model: Relational Model • Column Store • Only Materialized Views on Each Relation (perhaps many) • Active Data Compression • Column-Oriented Query Executor and Optimizer • Shared Nothing Architecture • Replication-Based Concurrency Control and Recovery
How to Evaluate The C-Store Paper • None of the ideas in isolation merit publication • Judge the complete system by its (hopefully intelligent) choice of • Small collection of inter-related powerful ideas • That together put performance in a new sandbox
C-Store code base version 0.2 • http://db.lcs.mit.edu/projects/cstore/cstore0.2.tar.gz • runs on Linux x86 computers • Tested on RedHat Linux • This code compiles on old versions BerkeleyDB and gcc. • BerkeleyDB.4.2 • LZO version 1 (http://www.oberhumer.com/opensource/lzo/)
References • Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS VLDB, pages 553-564, 2005. • VERTICA DATABASE TECHNICAL OVERVIEW WHITE PAPER. http://www.vertica.com/php/pdfgateway?file=VerticaArchitectureWhitePaper.pdf • http://www.sensage.com/English/Products/Event_Data_Warehouse.html