1 / 30

Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics

Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics. Importance of data. What organizations do with data? Transaction Processing (E)commerce: Amazon.com; PNC Bank B2B systems: Supply Chain Management Web Search Decision Making Financial reporting Inventory management

raja-mcgee
Download Presentation

Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Data Warehouse and Data MiningMIS 2502Data Analytics

  2. Importance of data What organizations do with data? • Transaction Processing • (E)commerce: Amazon.com; PNC Bank • B2B systems: Supply Chain Management • Web Search • Decision Making • Financial reporting • Inventory management • Budget allocation • Customer Relationship Management • Target Marketing • Product Design and Promotions • Fraud Detection

  3. Unnormalized data set

  4. Normalized db - before

  5. Normalized db - after

  6. Decision Making with Databases • Databases are used for transaction processing • Data from transaction processing is used for tactical decision making • Database provides basic reporting function • But…

  7. The Need for Data Analysis • Different managers require different data and data may come from other part of the organization or outside the organization • External and internal forces require tactical and strategic decisions • Search for competitive advantage • Business environments are dynamic • Decision-making cycle time is reduced

  8. Some Questions Analysts Need to Answers • Sales analysis: • What are the sales by quarter and geography? • How do sales compare in two different stores in the same state? • Profitability analysis: • Which is the most profitable store in the state CA? • Which product lines are the highest revenue producers this year? • Which products and product lines are the most profitable this quarter? • Sale force analysis • Which salesperson is the best revenue producer this year?Do salesperson X meet his sale target this quarter?

  9. From transaction processing to supporting decision making

  10. Operational vs. Decision Support Data • Operational data • Relational, normalized database • Optimized to support transactions • Real time updates • DSS • Snapshot of operational data • Summarized • Large amounts of data • Data analyst viewpoint • Timespan • Granularity • Dimensionality

  11. Data Warehouse • Integrated • Centralized • Holds data retrieved from entire organization • Subject-Oriented • Optimized to give answers to diverse questions • Used by all functional areas • Time Variant • Flow of data through time • Projected data • Non-Volatile • Data never removed • Always growing

  12. Data Warehouse • Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse • Data mart – contains a subset of data warehouse information

  13. ETL – Extraction, Transformation, Load Extract data from source systems • Transform: cleanse data for consistency and output exceptions • Apply business rules • Selecting certain columns to load (not null records) • Translating coded values (1, M, male = 0 ) • Derive new calculated value (sale_amount = qty * unit_price) • Join data from multiple sources (lookup, merge) • Aggregate (rollup/summarize data) • Transpose/pivot (turning columns into rows) • Data validation. Load: data into repository

  14. Data in a Data Warehouse • Data for a data warehouse is obtained from a variety of databases • E.g. customer database, transaction database, accounts database • Data in data warehouse is multidimensional

  15. Multidimensional Analysis • Cube – common term for the representation of multidimensional information

  16. Star Schema • Data-modeling technique • Maps multidimensional decision support into relational database • Yield model for multidimensional data analysis while preserving relational structure of operational DB • Four Components: • Facts • Dimensions • Attributes • Attribute hierarchies

  17. Simple Star Schema Figure 13.12

  18. Slice and Dice View of Sales Figure 13.14

  19. Star Schema Representation • Facts and dimensions represented by physical tables in data warehouse DB • Fact table related to each dimension table (M:1) • Fact and dimension tables related by foreign keys • Subject to the primary/foreign key constraints

  20. Star Schema for Sales Sales fact table and its four dimensions: location, time, product, and customer . Allows sales to be aggregated by time, geographic location, product, and by customer.

  21. Data Warehouse to Data Marts • Given the large size of a data warehouse, organizations create data marts • Subject oriented data • Subset of data in a data warehouse • Used for focused decision-making

  22. Online Analytical Processing (OLAP) • Advanced data analysis environment • Supports decision making, business modeling, and operations research activities • Characteristics of OLAP • Use multidimensional data analysis techniques • Provide advanced database support • Provide easy-to-use end-user interfaces • Support client/server architecture

  23. OLAP Client/Server Architecture Figure 13.6

  24. Data Mining • Seeks to discover patterns or relationships within the data • Data mining tools automatically search data for patterns and relationships • Data mining tools • Analyze data • Uncover problems or opportunities • Form computer models based on findings • Predict business behavior with models • Require minimal end-user intervention

  25. What Are Data-Mining Tools?

  26. Data Mining Process

  27. Business Intelligence AB113 - Information Technology

  28. MS SQL 2008 Architecture Relational Model Dimensional Model/Star Schema

  29. Back Room—Data prepared from many sources Front Room—Information presented

  30. Multidimensional Analysis and Data Mining • Differences between databases and data warehouse/data mart? • Data mining – the process of analyzing data to extract information not offered by the raw data alone • To perform data mining users need data-mining tools • Data-mining tool – uses a variety of techniques to find patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making • Business intelligence – taking data from multiple sources and turn it into useful and easy to understand information to support decision-making efforts for various kinds of people.

More Related