210 likes | 294 Views
An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal. Michael Goshey University of Minnesota, Fall 2006 CSci 8701: Overview of Database Research. Outline. Introduction Problem Addressed Major Contributions Key Concepts
E N D
An Analysis of the Publication "An Overview of Data Warehousing and OLAP Technology” by Surajit Chaudhuri, Umeshwar Dayal Michael Goshey University of Minnesota, Fall 2006 CSci 8701: Overview of Database Research
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Introduction • Selected paper • S. Chaudhuri and U. Dayal, An Overview of Data Warehousing and OLAP Technology, SIGMOD Record 26(1): 65-74(1997). • Motivation • Personal Interest Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Problem Addressed • Problem Statement • Survey: organizing the data warehousing space • Differing requirements between OLTP and OLAP • Significance • Growth area • Reference work establishing consensus on terms, architectures and issues Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Major Contributions • Bridging the gulf between industry and academia • OLTP vs. OLAP: clarifying the differences • Concise survey of relevant issues, architectures and tools • Concrete list of data warehouse design and build steps Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Key Concepts • Data warehouses and data marts • OLTP, OLAP, ROLAP vs. MOLAP) • Relational and dimensional data models • Bitmap Index • ETL • Metadata • Managed query vs. ad hoc environments • Materialized views • SQL extensions (cube, rollup, rank, percentile, etc.) Michael Goshey: 9/19/2006
Data Warehouse, Data Mart Michael Goshey: 9/19/2006
Relational or Dimensional? Michael Goshey: 9/19/2006
Relational or Dimensional? (image from http://www.laynetworks.com) Michael Goshey: 9/19/2006
Bitmap Indices • cardinality: unique values/total rows • B-Tree vs. bitmap: 1% rule, uniqueness • Boolean algebra directly on indices Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Validation Methodology • Survey paper goals • Academic and industry citations • Referencing tools, vendors • Case studies Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
Assumptions • Read-only environments • Shortcomings • (occasional) transactional commitments • the data revision problem Michael Goshey: 9/19/2006
Outline • Introduction • Problem Addressed • Major Contributions • Key Concepts • Validation Methodology • Assumptions • 2006 Rewrite Michael Goshey: 9/19/2006
2006 Rewrite • Changes in terminology, tools, vendors • Fact constellations -> conformed dimensions • Decision support -> BI • Vendors and tools in BI, ETL, OLAP • Multiple user constituencies • Data history difficulties • petabyte databases -> very large warehouses common • data expiry challenges • slowly changing dimensions Michael Goshey: 9/19/2006
Slowly Changing Dimensions • Before • After: Type 1 • After: Type 2 • After: Type 3 Michael Goshey: 9/19/2006
Questions? Michael Goshey: 9/19/2006