1 / 49

Data Warehouse and Business Intelligence Dr. Minder Chen Minder.Chen@CSUCI.EDU Fall 2009

Data Warehouse and Business Intelligence Dr. Minder Chen Minder.Chen@CSUCI.EDU Fall 2009. Online Resources. Additional resources: Teradata Student Network. The Premier Learning Resource for Data Warehousing, DSS/BI, and Database.  The URL is http://www.teradatastudentnetwork.com

quinto
Download Presentation

Data Warehouse and Business Intelligence Dr. Minder Chen Minder.Chen@CSUCI.EDU Fall 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse and Business Intelligence Dr. Minder Chen Minder.Chen@CSUCI.EDU Fall 2009

  2. Online Resources • Additional resources: • Teradata Student Network. • The Premier Learning Resource for Data Warehousing, DSS/BI, and Database.  The URL is http://www.teradatastudentnetwork.com • PSW: smartdecisions

  3. BI “The key in business is to know something that nobody else knows.” -- Aristotle Onassis PHOTO: HULTON-DEUTSCH COLL Business Intelligence (BI) is the process of gathering meaningful information to answer questions and identify significant trends or patterns, giving key stakeholders the ability to make better business decisions. “To understand is to perceive patterns.” — Sir Isaiah Berlin "The manager asks how and when, the leader asks what and why." — “On Becoming a Leader” by Warren Bennis

  4. BI Questions • What happened? • What were our total sales this month? • What’s happening? • Are our sales going up or down, trend analysis • Why? • Why have sales gone down? • What will happen? • Forecasting & “What If” Analysis • What do I want to happen? • Planning & Targets Source: Bill Baker, Microsoft

  5. Business Intelligence Increasing potential to support business decisions (MIS) End User Making Decisions Business Analyst Data Presentation Visualization Techniques Data Analyst Data Mining Information Discovery Data Exploration OLAP, MDA, Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts DBA Data Sources (Paper, Files, Information Providers, Database Systems, OLTP)

  6. Where is Business Intelligence applied? ERP Reporting KPI Tracking Product Profitability Risk Management Balanced Scorecard Activity Based Costing Global Sourcing Logistics Sales Analysis Sales Forecasting Segmentation Cross-selling CRM Analytics Campaign Planning Customer Profitability Operational Efficiency Customer Interaction

  7. Inmon's Definition of Data Warehouse – Data View • A warehouse is a • subject-oriented, • integrated, • time-variant and • non-volatile collection of data in support of management's decision making process. • Bill Inmon in 1990 Source: http://www.intranetjournal.com/features/datawarehousing.html

  8. Inmon's Definition Explain • Subject-oriented: They are organized around major subjects such as customer, supplier, product, and sales. Data warehouses focus on modeling and analysis to support planning and management decisions vs. operations and transaction processing. • Integrated: Data warehouses involve an integration of sources such as relational databases, flat files, and on-line transaction records. Processes such as data cleansing and data scrubbing achieve data consistency in naming conventions, encoding structures, and attribute measures. • Time-variant: Data contained in the warehouse provide information from an historical perspective. • Nonvolatile: Data contained in the warehouse are physically separate from data present in the operational environment.

  9. Kimball's Definition – Process View • A data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making. • Ralph Kimball

  10. The Data Warehouse Process Data Marts and cubes Source Systems Clients Data Warehouse Query Tools Reporting Analysis Data Mining 1 2 3 4 Design the Populate Create Query Data Warehouse Data Warehouse OLAP Cubes Data

  11. Key Concepts in BI Development Lifecycle

  12. Business Valuation Models for BI

  13. Performance Dashboards for Information Delivery

  14. Scorecards for Information Delivery

  15. OLTP Normalized Design Ware- house Ordering Process Chain Retailer Store Retailer Payments Retailer Returns Product POS Process Brand Account GL Retail Cust Retail Promo Clerk Cash Register

  16. OLTP Versus Business Intelligence: Who asks what? OLTP Questions When did that order ship? How many units are in inventory? Does this customer haveunpaid bills? Are any of customer X’s line items on backorder? Analysis Questions What factors affect order processing time? How did each product line (or product) contribute to profit last quarter? Which products have the lowest Gross Margin? What is the value of items on backorder, and is it trending up or downover time?

  17. OLTP vs. OLAP Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22

  18. Dimensional Design Process • Select the business process to model • Declare the grain of the business process/data in the fact table • Choose the dimensions that apply to each fact table row • Identify the numeric facts that will populate each fact table row Business Requirements Data Realities

  19. Select a business process to model • Not business departments or business functions • Cross-functional business processes • Business events • Examples: • Raw materials purchasing • Order fulfillment process • Shipments • Invoicing • Inventory • General ledger

  20. Requirements

  21. Identifying Measures and Dimensions Performance Measures for KPI Performance Drivers Measures Dimensions • The attribute varies • continuously: • Balance • Unit Sold • Cost • Sales • The attribute is perceived as • a constant or discrete value: • Description • Location • Color • Size

  22. A Dimensional Model for a Grocery Store Sales

  23. Product Dimension • SKU: Stock Keeping Unit • Hierarchy: • Department  Category  Subcategory  Brand  Product

  24. Creating Dimensional Model • Identify fact tables • Translate business measures into fact tables • Analyze source system information for additional measures • Identify base and derived measures • Document additivity of measures • Identify dimension tables • Link fact tables to the dimension tables • Create views for users

  25. Transaction Level Order Item Fact Table

  26. Inside a Dimension Table • Dimension table key: Uniquely identify each row. Use surrogate key (integer). • Table is wide: A table may have many attributes (columns). • Textual attributes. Descriptive attributes in string format. No numerical values for calculation. • Attributes not directly related: E.g., product color and product package size. No transitive dependency. • Not normalized (star schemar). • Drilling down and rolling up along a dimension. • One or more hierarchy within a dimension. • Fewer number of records.

  27. Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metric) of the business • May contain summarized (aggregated) data • May contain date-stamped data • Are typically additive • Have key value that is typically a concatenated key composed of the primary keys of the dimensions • Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

  28. Facts Table Measurements of business events. Dimensions Measures The Fact Table contains keys and units of measure

  29. Snowflake Schema Brands Products Channels Dates Sales Customers Promotions

  30. Hierarchy

  31. OLAP Solutions • Data Warehouse/Data Mart • Dimensions • Measures • Cubes • Cells Europe Asia US Gadgets Gizmos Thingies Widgets 130 135 140 142 205 390 350 475 175 230 190 250 310 340 410 450 Q1 Q2 Q3 Q4

  32. Operations in Multidimensional Data Model • Aggregation (roll-up) • dimension reduction: e.g., total sales by city • summarization over aggregate hierarchy: e.g., total sales by city and year  total sales by region and by year • Selection (slice) defines a subcube • e.g., sales where city = Palo Alto and date = 1/15/96 • Navigation to detailed data (drill-down) • e.g., (sales - expense) by city, top 3% of cities by average income • Visualization Operations (e.g., Pivot)

  33. A Visual Operation: Pivot (Rotate) NY LA SF Month Juice Cola Milk Cream 10 47 Region 30 12 Product 3/1 3/2 3/3 3/4 Date

  34. Date Dimension of the Retail Sales Model

  35. Store Dimension • It is not uncommon to represent multiple hierarchies in a dimension table. Ideally, the attribute names and values should be unique across the multiple hierarchies.

  36. Multidimensional Query Techniques Product Time Why? Slicing Geography What? Why? Dicing Why? Drilling down

  37. ETL ETL = Extract, Transform, Load • Moving data from production systems to DW • Checking data integrity • Assigning surrogate key values • Collecting data from disparate systems • Reorganizing data

  38. Pivot Table in Excel

  39. Data Quality Issues • No common time basis • Different calculation algorithms • Different levels of extraction • Different levels of granularity • Different data field names • Different data field meanings • Missing information • No data correction rules • No drill-down capability

  40. Building The Warehouse Transforming Data

  41. The Anomalies Nightmare NAME ADDRESS TYPE CUST # 90328574 187 N. PARK St. Salem NH 01458 OEM Digital Equipment 187 N. Pk. St. Salem NH 01458 OEM 90328575 DEC 187 N. Park St Salem NH 01458 $#% 90238475 Digital Digital Corp 187 N. Park Ave. Salem NH 01458 Comp 90233479 Digital Consulting 15 Main Street Andover MA 02341 Consult 90233489 Digital Info Service PO Box 9 Boston MA 02210 Mail List 90234889 Park Blvd. Boston MA 04106 SYS INT 90345672 Digital Integration Noise in Blank Fields Spelling No Unique Key Anomalies No Standardization How does one correctly identify and consolidate anomalies from millions of records?

  42. OLAP and Data Mining Address Different Types of Questions While reporting and OLAP are informative about past facts, only data mining can help you predict the future of your business. Source: http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=2367

  43. Use of Data Mining • Customer profiling • Market segmentation • Buying pattern affinities • Database marketing • Credit scoring and risk analysis

  44. Associates Which items are purchased in a retail store at the same time?

  45. Sequential Patterns What is the likelihood that a customer will buy a product next month, if he buys a related item today?

  46. Classifications Determine customers’ buying patterns and then find other customers with similar attributes that may be targeted for a marketing campaign.

  47. Modeling Use factors, such as location, number of bedrooms, and square footage, to Determine the market value of a property

More Related