1 / 30

MIS 5346 Data warehousing

MIS 5346 Data warehousing. Concepts and Components Chapters 1 , 2, 7, 14, 15. Agenda. Background Data Warehouse vs Operational Data Store Characteristics of a Data Warehouse Improvements in Data Warehousing Relationship to Business Intelligence.

fagan
Download Presentation

MIS 5346 Data warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MIS 5346 Data warehousing Concepts and Components Chapters 1, 2, 7, 14, 15

  2. Agenda • Background • Data Warehouse vs Operational Data Store • Characteristics of a Data Warehouse • Improvements in Data Warehousing • Relationship to Business Intelligence

  3. Evolution of Decision Support Technologies • Business people need information to make plans, decisions, and assess results • 60's • Batch reports • 70’s • DSSs • 80’s • Info Centers • 90’s • Early DWs • 2000's • Business Intelligence • Issues: • Dependency on IT resources • Based on OLTP or extracts • Functionality often pre-programmed • "Big Data" Analytics

  4. DW vs. Business Intelligence • Short: • DW = populating structures with data • BI = using DW data • Long: • DW = body of historical data, separate from the operations of the organization, used to create BI • BI = the delivery of timely, accurate, and useful information to decision-makers • Broad: • BI = a broad category of applications, technologies, and organizational processes for gathering, storing, accessing, and analyzing data to help business users make better decisions

  5. Need for Decision-Optimized Data Storage • Business people need information to make plans, decisions, and assess results • What were sales volumes by region and product category for the last 3 years? • Which of two new medications will result in the best outcomes (higher recovery rate and shorter hospital stay)? • Data captured by complex operational systems (OLTPs) optimized to support well-defined transaction requirements • Difficult to get needed information from data grounded in OLTPs

  6. Operational vs. Informational Data

  7. Data Warehouse “… a subject-oriented, integrated, nonvolatile, and time variant collection of data in support of management decisions.” • Managing the Data Warehouse, W. H. Inmon, John Wiley & Sons, December, 1996. • “… a copy of transaction data specifically structured for query and analysis.” • The Data Warehouse Toolkit, R. Kimball, John Wiley & Sons, February, 1996. • Enterprise data, transformed, integrated, accumulated over time, optimized for decision-making, and accessible via analytical tools

  8. Characteristics of a DW (ala Inmon) • Subject-Oriented • As opposed to business-process oriented • Integrated • Multiple sources, internal and external • Critical part of DW implementation • Time-Variant • History, time periods important • Non-Volatile • DW data not changed once stored

  9. Characteristics of a DW, cont… • Subject-Oriented • Needs are business subject-focused • Integrated • Multiple sources, internal and external • Time-Variant • History, time periods important • Non-Volatile • DW data not changed once stored • Data Granularity

  10. Data Granularity • Level of detail stored in database • Operational focus • Analytical focus • Examples: • Life Insurance Policy vs. Life Insurance Coverage • Product Category vs. Product Sales • High granularity (eg, transactional grain) is most flexible

  11. Challenges in Early DW Implementation • Improper or infeasible architectures, approaches • Insufficient attention to organizational strategy and culture • Early information delivery tools too complex for business users • Storage technology made it difficult to store much detail or history, and slow to process

  12. Improved Technology • User-friendly tools for analysis, visualization • Excel • Tableau • Reporting Services, … • Improved technology for accessing, aggregating, partitioning data • Advances in processing technology • Parallel processing • Advances in storage technology • RAID • Solid State

  13. Improved Architectures • Based on • Data Marts • Conformed dimensions • BI-emphasis

  14. Data Warehouse vs. Data Marts • Enterprise Data Warehouse • Information about ALL subjects important to the organization

  15. Data Warehouse vs. Data Marts, cont… • Data Marts • Subsets of data warehouse that focus on a selected subject area; typically departmental in nature

  16. Data Warehouse Architecture: Basic

  17. Data Warehouse Architecture: Types

  18. BI Architecture Source: Chaudhuri et. al. , An Overview of Business Intelligence Technology, Communications of the ACM, 54(8), August 2011, pp. 88-98.

  19. BI Architecture, cont… Source: Oracle Corporation. Information Management and Big Data: A Reference Architecture, Oracle White Paper, February 2013, p. 12.

  20. Architecture Components • Data Sources • Data Staging (Movement) • Data Storage (Warehouse) • Data Analysis/Discovery (Mid-tier) • Information Delivery (Front-end Presentation)

  21. 1. Data Sources • Identifying required business data from • Production • Internal, Personal • Archived • External

  22. 2. Data Staging • Extract • From source systems • Transform • Cleanse • Supplement • Convert • Combine… • Load • Populate data warehouse/mart tables

  23. 3. Data Storage • Data Warehouse / Data Mart • Relational database for structured data • Non-relational (e.g., Hadoop) data store for "loosely-structured" data • Metadata • Relational database • Catalog • Extended properties • Custom tables • External products/tools • Spreadsheets…

  24. 4. Data Analysis: Supporting Knowledge Discovery • Layout-Led Discovery • Pre-Designed Reports • Data-Led Discovery • OLAP Analysis • Model-Led Discovery • Data Mining

  25. Pre-Defined Reports • Information pushed to user • Content and layout pre-determined • Can be parameter-driven • Can support some drill-down • May also include basic report development

  26. OLAP • Online Analytical Processing • Providing On-Line Analytical Processing to User Analysts, E. F. Codd, Codd & Date, Inc 1993. • Short Definition: • Class of applications or tools that support ad-hoc analysis of multidimensional data • Longer Definition: • “…technology that enables [users]… to gain insight into data through…fast, consistent, interactive access [to]…information that has been transformed…to reflect the real dimensionality of the enterprise…” • OLAP Council (www.olapcouncil.org)

  27. Data Mining • Search for patterns in large amounts of data • Making connections/associations with data • Predicting future outcomes • OLAP vs. Data Mining • “Report on the past” vs. “Predict the future” • Part of Knowledge Discovery…

  28. 5. Information Delivery

  29. Examples of Uses of BI/DW

  30. Next Time… • Data Warehouse Design (Dimensional Modeling) • *** Assignment 1 Due 1/28 ***

More Related