280 likes | 411 Views
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis). Introduction to Data Warehouse. Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business. Outline. Why Data Warehouse? Problems, causes and data warehouse solutions What is Data Warehouse?
E N D
ACCTG 6910Building Enterprise & Business Intelligence Systems(e.bis) Introduction to Data Warehouse Olivia R. Liu Sheng, Ph.D.Emma Eccles Jones Presidential Chair of Business
Outline • Why Data Warehouse? • Problems, causes and data warehouse solutions • What is Data Warehouse? • Characteristics and components • Current Practices of Data Warehouse
Why Data Warehouse? • Knowledge Management Problems (Drowning in data, starving for knowledge) • Can’t access data (easily) E.g., data from different branches, years, functional areas, etc. • Give me only what’s important (knowledge) E.g., Regions and products that have upward sales trends over the last five years. • I need to reduce data to what’s important by slicing and dicing. E.g., by branch, product, year, etc.
Why Data Warehouse? • Data inconsistency and poor data quality E.g., the 2001 PC sales amount in SLC from the CFO and the SLC Account Manager are not the same. • Need to improve the practices of making informed decisions. E.g., Did the VP for Marketing decide on the advertising budgets for branches in the SW region based on their sales performances over the last five years? • Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to wait for the MIS Department to generate sales performance reports and analyses.
Why Data Warehouse? • ROI Problems • Can I get more value out of my data? Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time. • Can I do this cost-effectively? Options: federated (interoperable) databases vs. a data warehouse • Can I easily scale up or change how I get knowledge out of my data? E.g., Add more regions, functional areas or years in sales performance analyses.
Causes for the Problems Cause 1: Isolated databases distributed in an enterprise A Root cause for problems 1, 4, 5, 6, 7, 8 and 9 CRM Sales Inventory
Why Data Warehouse • Cause 1: Isolated databases distributed in an enterprise Ad hoc access solutions cannot alleviate the problems Sales CRM Inventory
Why Data Warehouse • Cause 2: Historical data is archived in offline storage systems Another Root cause for problems 1, 4, 5, 6, 7, 8 and 9 Historical Sales Data Sales Archive
Why Data Warehouse • Cause 2: Historical data is archived in offline storage systems Ad hoc accesses are slow and inconvenient Historical Sales Data Sales Archive
Cause 3: Metadata for Transaction DB systems is Not User Friendly SSN Name Instructor Dependent Has SSN 1 M Rank Name 1 Name Address Course Student Take M M Sex Phone Relation Grade IS-A C-No C-Name Under- graduate Graduate Minor Major Major
Why Data Warehouse • Cause 4: Query and programming languages are even less user friendly • DESB students’ academic grades and GPAs since the freshman year • Sales amount distribution by product category, customer state and year • Slicing and dicing • SQL statements??? • Report/screen interface codes???
Why Data Warehouse • Cause 5: Transaction databases are optimized (normalized) to process transactions but not to answer decision support queries • Bad query performance to join the normalized tables • Heavy transaction processing workload
What is Data Warehouse Designed to solve problems associated with current database practices: • Isolated, distributed databases Extract, replicate, integrate, cleanse & load Sales CRM Data Warehouse Inventory
Why Data Warehouse • Historical data is archived in offline storage systems Integrate Historical Data with Current Data Data Warehouse Historical Sales Data Sales Archive
What is Data Warehouse • Causes 3, 4 and 5: Hard-to-understand metadata, and query and programming languages; poor decision support query performances • Solution: In data warehouse, organize data in subject –oriented way rather than process-oriented way – dimensional modeling.
Dimensional Modeling (Star Schema) Course Instructor . Number . Title Academic Performance . Name . Rank . Grade Semester Student . Year . Length . Start date . Name . UG/PG . Major
Dimensional Modeling (Star Schema) Customer Branch . Name . State . City . Name . State . City Sales . Qty . Amt Time Product . Year . Quarter . Month . Name . Category
Application Program Application Program One System for Multiple Uses Interactive Queries/ Transactions Database Database Management System (DBMS) Metadata Database System
Two Worlds -> Two Systems Executive Information System Operational Application Operational Application Decision Support System (DSS) Operational Application OLTP DBs Data warehouse Reporting DSS Operational
What is Data Warehouse • Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process. • 1. Subject-oriented means the data warehouse focuses on the high-level entities of business such as sales, products, and customers. This is in contrast to database systems, which deals with processes such as placing an order.
What is Data Warehouse • 2. Integrated means the data is integrated from distributed data sources and historical data sources and stored in a consistent format. 3. Time-variant means the data associates with a point in time (i.e., semester, fiscal year and pay period) 4. Non-volatile means the data doesn’t change once it gets into the warehouse.
Data Warehouse and Data Mart • Data warehouse – defined by its decision support purpose and other characteristics • Other characteristics: subject-oriented, integrated • Data mart – a data warehouse for a more limited business scope (e.g., a department, etc.) • A data warehouse may be built from several data marts
Basic Elements of a Data Warehouse System The Data Warehouse Presentation Servers Source System (Legacy) Data Staging Area End User Data Access Data Mart #1: OLAP ( ROLAP and/or MOLAP) query services; dimensional! Subject oriented; locally implemented; user group driven; may store atomic data; may be frequently refreshed; conform to DW Bus Populate, replicate, recover Ad Hoc Query Tools Storage: Flat files (fastest); RDBMS; Other Processing: Clean; Prune; Combine; Remove duplicates; households; standardize; conform dimensions; store awaiting replications; archive; export to data marts No user query services feed Relational extract Report Writers feed Flat files Spreadsheets extract Populate, replicate, recover End User Applications Conformed dimensions and facts feed DW BUS Data Mart #2 Populate, replicate, recover feed Models: forecasting; scoring; allocating; data mining; other downstream Systems; other parameters; special UI DW BUS Conformed dimensions and facts ERP Legacy extract Data Mart #3 Uploaded cleaned dimensions Uploaded model results
Current Practice of DW* • Expected DW market value in 2002 was projected to have grown to $113.5 billion. • Average DW development cost is $1.5 million and average maintenance cost is $0.5 million. • DW development time ranges from 1 to 3 yrs. * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
Current Practice of DW* • Sponsorship for the DW project Sponsor Percentage VP of a business unit 39.8 CIO 26.9 Business unit manager 16.7 CEO 11.1 Other 25.0 * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001
Current Practice of DW* • DW Benefits • Less effort to produce better information • Better decisions • Improvement of business processes • Support for accomplishments of strategic business objectives • Return on Investments and Cost of Ownership? * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001