1 / 19

C-Store: Column-Oriented Data Warehousing

C-Store: Column-Oriented Data Warehousing. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010. C-Store ’ s Father: Michael Stonebraker. A former Professor at Berkeley, an Adjunct Professor at M.I.T. ACM Software System Award , 1988 INGRES, developed by undergraduates

stian
Download Presentation

C-Store: Column-Oriented Data Warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C-Store: Column-Oriented Data Warehousing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010

  2. C-Store’s Father:Michael Stonebraker • A former Professor at Berkeley, • an Adjunct Professor at M.I.T. • ACM Software System Award, 1988 • INGRES, developed by undergraduates • POSTGRES, Mariposa, C-Store • ACM SIGMOD Innovation Award, 1994 • National Academy of Engineering , 1998

  3. C-Store: The Home Pagehttp://db.lcs.mit.edu/projects/cstore/ • C-Store: A Column-Oriented DBMS • download-Source code • overview-Project description • papers-Publications • people-Who are we? • The CStore project is a collaboration between MIT, Yale, Brandeis University. Brown University, and UMass Boston . • Commercialized C-Store: Vertica

  4. The Starting Point • C-Store: A Column Oriented DBMS • Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. • VLDB, pages 553-564, 2005.

  5. C-Store: the Column Store Project • Row Store or Column Store ? Column 1 Column 2 Column 3 Record 1 Record 2 Record 3 Relation or Tables

  6. Example of a Relation

  7. The History: Relational Model • Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM13 (6): 377–387. • Physical Data Independence • Row Store Vs. Column Store on the same Conceptual Model: Relation

  8. Row Store: Why? • OLTP (On-Line Transaction Processing) • ATM, POS in supermarkets • Characteristics of OLTP applications : • Transactions that involve small numbers of records (or tuples) • Frequent updates (including queries) • Many users • Fast response times • OLTP Needs Write-Optimized Row Store. • Insert and delete a record in one physical write.

  9. Row Store: Columns Stored Together Data Rid = (i,N) Page i • Record id = <page id, slot #> Rid = (i,2) Rid = (i,1) N Pointer to start of free space 16 24 20 N . . . 2 1 # slots Slot Array SLOT DIRECTORY

  10. Current DBMS Gold Standard • Store Columns in one record contiguously on disk • Use B-tree indexing • Use small (e.g. 4K) disk blocks • Align fields on byte or word boundaries • Conventional (row-oriented) query optimizer and executor (technology from 1979) • Aries-style transactions

  11. From OLTP to OLAP and Data Warehouse • OLAP (On-Line Analytical Processing, Codd, 1993) • Flexible Reporting for Business Intelligence • Characteristics of OLAP applications : • Transactions that involve large numbers of records • Frequent Ad-hoc queries and Infrequent updates • A few decision making users • Fast response times • Data warehouses are designed to facilitate reporting and analysis. • Read-Mostly

  12. Other Read-Mostly Applications • CRM (Customer Relationship Management ) • Siebel (Oracle) • Catalog Search in Electronic Commerce • Amazon.com • Shopping.com

  13. Column Store: Why? • The Intuition: Only read relevant columns • Say, Ad-hoc queries read 2 columns out of 20 • Column Store is not a new idea • Sybase IQ (early ’90s, bitmap index) • Addamark (i.e., SenSage, for Event Log data warehouse) • MonetDB (Hyper-Pipelining Query Execution, CIDR’05)

  14. C-Store Technical Ideas • Logical Data Model: Relational Model • Column Store • Only Materialized Views on Each Relation (perhaps many) • Active Data Compression • Column-Oriented Query Executor and Optimizer • Shared Nothing Architecture • Replication-Based Concurrency Control and Recovery

  15. How to Evaluate The C-Store Paper • None of the ideas in isolation merit publication • Judge the complete system by its (hopefully intelligent) choice of • Small collection of inter-related powerful ideas • That together put performance in a new sandbox

  16. Architecture of C-Store (Vertica)On a Single Node

  17. C-Store code base version 0.2 • http://db.lcs.mit.edu/projects/cstore/cstore0.2.tar.gz • runs on Linux x86 computers • Tested on RedHat Linux • This code compiles on old versions BerkeleyDB and gcc. • BerkeleyDB.4.2 • LZO version 1 (http://www.oberhumer.com/opensource/lzo/)

  18. References • Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS VLDB, pages 553-564, 2005. • VERTICA DATABASE TECHNICAL OVERVIEW WHITE PAPER. http://www.vertica.com/php/pdfgateway?file=VerticaArchitectureWhitePaper.pdf • http://www.sensage.com/English/Products/Event_Data_Warehouse.html

More Related