320 likes | 570 Views
MIS 5346 Data warehousing. Concepts and Components Chapters 1 , 2, 7, 14, 15. Agenda. Background Data Warehouse vs Operational Data Store Characteristics of a Data Warehouse Improvements in Data Warehousing Relationship to Business Intelligence.
E N D
MIS 5346 Data warehousing Concepts and Components Chapters 1, 2, 7, 14, 15
Agenda • Background • Data Warehouse vs Operational Data Store • Characteristics of a Data Warehouse • Improvements in Data Warehousing • Relationship to Business Intelligence
Evolution of Decision Support Technologies • Business people need information to make plans, decisions, and assess results • 60's • Batch reports • 70’s • DSSs • 80’s • Info Centers • 90’s • Early DWs • 2000's • Business Intelligence • Issues: • Dependency on IT resources • Based on OLTP or extracts • Functionality often pre-programmed • "Big Data" Analytics
DW vs. Business Intelligence • Short: • DW = populating structures with data • BI = using DW data • Long: • DW = body of historical data, separate from the operations of the organization, used to create BI • BI = the delivery of timely, accurate, and useful information to decision-makers • Broad: • BI = a broad category of applications, technologies, and organizational processes for gathering, storing, accessing, and analyzing data to help business users make better decisions
Need for Decision-Optimized Data Storage • Business people need information to make plans, decisions, and assess results • What were sales volumes by region and product category for the last 3 years? • Which of two new medications will result in the best outcomes (higher recovery rate and shorter hospital stay)? • Data captured by complex operational systems (OLTPs) optimized to support well-defined transaction requirements • Difficult to get needed information from data grounded in OLTPs
Data Warehouse “… a subject-oriented, integrated, nonvolatile, and time variant collection of data in support of management decisions.” • Managing the Data Warehouse, W. H. Inmon, John Wiley & Sons, December, 1996. • “… a copy of transaction data specifically structured for query and analysis.” • The Data Warehouse Toolkit, R. Kimball, John Wiley & Sons, February, 1996. • Enterprise data, transformed, integrated, accumulated over time, optimized for decision-making, and accessible via analytical tools
Characteristics of a DW (ala Inmon) • Subject-Oriented • As opposed to business-process oriented • Integrated • Multiple sources, internal and external • Critical part of DW implementation • Time-Variant • History, time periods important • Non-Volatile • DW data not changed once stored
Characteristics of a DW, cont… • Subject-Oriented • Needs are business subject-focused • Integrated • Multiple sources, internal and external • Time-Variant • History, time periods important • Non-Volatile • DW data not changed once stored • Data Granularity
Data Granularity • Level of detail stored in database • Operational focus • Analytical focus • Examples: • Life Insurance Policy vs. Life Insurance Coverage • Product Category vs. Product Sales • High granularity (eg, transactional grain) is most flexible
Challenges in Early DW Implementation • Improper or infeasible architectures, approaches • Insufficient attention to organizational strategy and culture • Early information delivery tools too complex for business users • Storage technology made it difficult to store much detail or history, and slow to process
Improved Technology • User-friendly tools for analysis, visualization • Excel • Tableau • Reporting Services, … • Improved technology for accessing, aggregating, partitioning data • Advances in processing technology • Parallel processing • Advances in storage technology • RAID • Solid State
Improved Architectures • Based on • Data Marts • Conformed dimensions • BI-emphasis
Data Warehouse vs. Data Marts • Enterprise Data Warehouse • Information about ALL subjects important to the organization
Data Warehouse vs. Data Marts, cont… • Data Marts • Subsets of data warehouse that focus on a selected subject area; typically departmental in nature
BI Architecture Source: Chaudhuri et. al. , An Overview of Business Intelligence Technology, Communications of the ACM, 54(8), August 2011, pp. 88-98.
BI Architecture, cont… Source: Oracle Corporation. Information Management and Big Data: A Reference Architecture, Oracle White Paper, February 2013, p. 12.
Architecture Components • Data Sources • Data Staging (Movement) • Data Storage (Warehouse) • Data Analysis/Discovery (Mid-tier) • Information Delivery (Front-end Presentation)
1. Data Sources • Identifying required business data from • Production • Internal, Personal • Archived • External
2. Data Staging • Extract • From source systems • Transform • Cleanse • Supplement • Convert • Combine… • Load • Populate data warehouse/mart tables
3. Data Storage • Data Warehouse / Data Mart • Relational database for structured data • Non-relational (e.g., Hadoop) data store for "loosely-structured" data • Metadata • Relational database • Catalog • Extended properties • Custom tables • External products/tools • Spreadsheets…
4. Data Analysis: Supporting Knowledge Discovery • Layout-Led Discovery • Pre-Designed Reports • Data-Led Discovery • OLAP Analysis • Model-Led Discovery • Data Mining
Pre-Defined Reports • Information pushed to user • Content and layout pre-determined • Can be parameter-driven • Can support some drill-down • May also include basic report development
OLAP • Online Analytical Processing • Providing On-Line Analytical Processing to User Analysts, E. F. Codd, Codd & Date, Inc 1993. • Short Definition: • Class of applications or tools that support ad-hoc analysis of multidimensional data • Longer Definition: • “…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…” • OLAP Council (www.olapcouncil.org)
Data Mining • Search for patterns in large amounts of data • Making connections/associations with data • Predicting future outcomes • OLAP vs. Data Mining • “Report on the past” vs. “Predict the future” • Part of Knowledge Discovery…
Next Time… • Data Warehouse Design (Dimensional Modeling) • *** Assignment 1 Due 1/28 ***