1 / 24

Data Warehouse & Mining

Data Warehouse & Mining. Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2). Why a Data Warehouse (DWH)?. Data recording and storage is growing: Almost every industry has huge amount of operational data.

cricket
Download Presentation

Data Warehouse & Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse & Mining Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2)

  2. Why a Data Warehouse (DWH)? • Data recording and storage is growing: • Almost every industry has huge amount of operational data. • Careful use/analysis of historic information may result in excellent prediction for the future: • Knowledge worker wants to turn available data into useful information. • This information is used by them to support strategic decision making. • Gives total view of the organization: • It is a platform for consolidated historical data for analysis. • It stores data of good quality so that knowledge worker can make correct decisions. • Intelligent decision-support is required for decision-making. Data Warehouse & Mining- Spring 2012

  3. Why a Data Warehouse? (Contd.) • From business perspective: • It is latest marketing weapon. • Helps to keep customers by learning more about their needs. • Valuable tool in today’s competitive fast evolving world. Data Warehouse & Mining- Spring 2012

  4. Reason-I: Why a Data Warehouse (DWH)? • Data sets are growing: Data Warehouse & Mining- Spring 2012

  5. Reason-I: Why a Data Warehouse (DWH)? • Size of Data Sets are going up. • Cost of Data Storage is coming down. • The amount of data average business collects and stores is doubling every year. • Total hardware and software cost to store and manage 1 MB of data: • 1990: $ 15 • 2002: ¢ 15 (down 100 times) • 2010: < ¢ 1 (down 150 times) • A few examples: • Wall Mart: 24+ TB • Finance Telecom: 100+ TB • CERN: Upto 20 PB by 2006 • Stanford Linear Accelerator Center (SLAC): 500 TB • Telenor, Ufone, Mobilink, Warid, Zong ??? Data Warehouse & Mining- Spring 2012

  6. Caution! A Warehouse of Data is NOT a Data Warehouse. Data Warehouse & Mining- Spring 2012

  7. Caution! Size is NOT Everything. Data Warehouse & Mining- Spring 2012

  8. Reason-2: Why a Data Warehouse (DWH)? • Businesses demand Intelligence (BI). • Complex questions from integrated data. • “Intelligent Enterprise” DBMS Approach • List of all items that were sold last month? • List of all makeup items purchased by Sassi? • The total sales of the last month grouped by branch? • How many sales transactions occurred during the month of January? Intelligent Enterprise • Which items sell together? Which items to stock? • Where and how to place the items? What discounts to offer? • How best to target customers to increase sales at a branch? • Which customers are most likely to respond to my next promotional campaign, and why? Data Warehouse & Mining- Spring 2012

  9. Reason-3: Why a Data Warehouse (DWH)? • Businesses want much more … • What happened? • Why it happened? • What will happen? • What is happening? • What do you want to happen? Data Warehouse & Mining- Spring 2012

  10. What is a Data Warehouse? A complete repository of historical corporate data extracted from transaction systems that is available for ad-hoc access by knowledge workers. Data Warehouse & Mining- Spring 2012

  11. What is a Data Warehouse? • Transaction System: • Management Information System (MIS) • Could be typed sheets (NOT transaction system) • Ad-Hoc Access: • Does not have a certain access pattern • Queries not known in advance • Difficult to write SQL in advance • Knowledge Workers: • Typically NOT IT literate (Executives, Analysts, Managers) • NOT clerical workers • Decision makers Data Warehouse & Mining- Spring 2012

  12. What is a Data Warehouse? • Inmons’s Definition: • A Data Warehouse is: • Subject-oriented • Integrated • Time-variant • Nonvolatile • Collection of data in support of management’s decision making process. Data Warehouse & Mining- Spring 2012

  13. Another View of a DWH Subject Oriented Integrated Time Variant Non Volatile Data Warehouse & Mining- Spring 2012

  14. Subject-oriented • Data Warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process. Data Warehouse & Mining- Spring 2012

  15. Integration • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Data Warehouse Legacy System Flat File Data Processing Data Transformation Data Warehouse & Mining- Spring 2012

  16. Time-variant • Provides information from historical perspective e.g. past 5-10 years. • Every key structure contains either implicitly or explicitly an element of time. Data Warehouse & Mining- Spring 2012

  17. Nonvolatile • Data once recorded cannot be updated. • Data Warehouse requires two operations in data accessing • Initial loading of data • Access of data load access Data Warehouse & Mining- Spring 2012

  18. Summary:What is a Data Warehouse? • It is a blend of many technologies, the basic concept being: • Take all data from different operational systems • If necessary, add relevant data from industry • Transform all data and bring into a uniform format • Integrate all data as a single entity • Store data in a format supporting easy access for decision support • Create performance enhancing indices • Implement performance enhancement joins • Run ad-hoc queries with slow selectivity Data Warehouse & Mining- Spring 2012

  19. Benefits of Data Warehouse • High returns on investment. • Substantial competitive advantage. • Increased productivity of corporate decision-makers. • Fast reporting for decision making process. • Reduced reporting load on transactional systems. • Making institutional data more user-friendly and accessible for knowledge workers. • Integrated data from different source systems. • Enabled ‘point-in-time’ analysis and trending over time. • Helps in identifying and resolving data integrity issues, either in the warehouse itself or in the source systems that collect the data. Data Warehouse & Mining- Spring 2012

  20. Data Warehouse: How is it Different? 1. Decision making is Ad-Hoc Data Warehouse & Mining- Spring 2012

  21. Data Warehouse: How is it Different? 2. Different patterns of hardware utilization Bus Service vs. Train Data Warehouse & Mining- Spring 2012

  22. Data Warehouse: How is it Different? 3. Combines operational and historic data • Don’t do data entry into a DWH. OLTP or ERP are the source systems. • OLTP systems don’t keep history, cannot get balance statement more than a year old. • DWH keep historical data, even of bygone customers. Why? • In the context of bank, want to know why the customer left? • What are the events that led to his/her leaving? Why? • Customer retention Data Warehouse & Mining- Spring 2012

  23. Data Warehouse: How is it Different? How much history? • Depends on: • Industry • Cost of storing historical data • Economic value of historical data • Industry and history • Telecom calls are much much more as compared to bank transactions • 18 months • Retailers interested in analyzing yearly seasonal patterns • 65 weeks, why? • Insurance companies want to do actuary analysis, use the historical data in order to predict risk • 7 years Hence NOT a complete repository of data. Data Warehouse & Mining- Spring 2012

  24. Data Warehouse: How is it Different? How much history? Economic value of data vs. storage cost Data Warehouse a complete repository of data? Data Warehouse & Mining- Spring 2012

More Related