240 likes | 469 Views
Data Warehouse & Mining. Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2). Why a Data Warehouse (DWH)?. Data recording and storage is growing: Almost every industry has huge amount of operational data.
E N D
Data Warehouse & Mining Dr. Abdul Basit Siddiqui Assistant Professor FUIEMS (Lecture Slides Week # 2)
Why a Data Warehouse (DWH)? • Data recording and storage is growing: • Almost every industry has huge amount of operational data. • Careful use/analysis of historic information may result in excellent prediction for the future: • Knowledge worker wants to turn available data into useful information. • This information is used by them to support strategic decision making. • Gives total view of the organization: • It is a platform for consolidated historical data for analysis. • It stores data of good quality so that knowledge worker can make correct decisions. • Intelligent decision-support is required for decision-making. Data Warehouse & Mining- Spring 2012
Why a Data Warehouse? (Contd.) • From business perspective: • It is latest marketing weapon. • Helps to keep customers by learning more about their needs. • Valuable tool in today’s competitive fast evolving world. Data Warehouse & Mining- Spring 2012
Reason-I: Why a Data Warehouse (DWH)? • Data sets are growing: Data Warehouse & Mining- Spring 2012
Reason-I: Why a Data Warehouse (DWH)? • Size of Data Sets are going up. • Cost of Data Storage is coming down. • The amount of data average business collects and stores is doubling every year. • Total hardware and software cost to store and manage 1 MB of data: • 1990: $ 15 • 2002: ¢ 15 (down 100 times) • 2010: < ¢ 1 (down 150 times) • A few examples: • Wall Mart: 24+ TB • Finance Telecom: 100+ TB • CERN: Upto 20 PB by 2006 • Stanford Linear Accelerator Center (SLAC): 500 TB • Telenor, Ufone, Mobilink, Warid, Zong ??? Data Warehouse & Mining- Spring 2012
Caution! A Warehouse of Data is NOT a Data Warehouse. Data Warehouse & Mining- Spring 2012
Caution! Size is NOT Everything. Data Warehouse & Mining- Spring 2012
Reason-2: Why a Data Warehouse (DWH)? • Businesses demand Intelligence (BI). • Complex questions from integrated data. • “Intelligent Enterprise” DBMS Approach • List of all items that were sold last month? • List of all makeup items purchased by Sassi? • The total sales of the last month grouped by branch? • How many sales transactions occurred during the month of January? Intelligent Enterprise • Which items sell together? Which items to stock? • Where and how to place the items? What discounts to offer? • How best to target customers to increase sales at a branch? • Which customers are most likely to respond to my next promotional campaign, and why? Data Warehouse & Mining- Spring 2012
Reason-3: Why a Data Warehouse (DWH)? • Businesses want much more … • What happened? • Why it happened? • What will happen? • What is happening? • What do you want to happen? Data Warehouse & Mining- Spring 2012
What is a Data Warehouse? A complete repository of historical corporate data extracted from transaction systems that is available for ad-hoc access by knowledge workers. Data Warehouse & Mining- Spring 2012
What is a Data Warehouse? • Transaction System: • Management Information System (MIS) • Could be typed sheets (NOT transaction system) • Ad-Hoc Access: • Does not have a certain access pattern • Queries not known in advance • Difficult to write SQL in advance • Knowledge Workers: • Typically NOT IT literate (Executives, Analysts, Managers) • NOT clerical workers • Decision makers Data Warehouse & Mining- Spring 2012
What is a Data Warehouse? • Inmons’s Definition: • A Data Warehouse is: • Subject-oriented • Integrated • Time-variant • Nonvolatile • Collection of data in support of management’s decision making process. Data Warehouse & Mining- Spring 2012
Another View of a DWH Subject Oriented Integrated Time Variant Non Volatile Data Warehouse & Mining- Spring 2012
Subject-oriented • Data Warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process. Data Warehouse & Mining- Spring 2012
Integration • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Data Warehouse Legacy System Flat File Data Processing Data Transformation Data Warehouse & Mining- Spring 2012
Time-variant • Provides information from historical perspective e.g. past 5-10 years. • Every key structure contains either implicitly or explicitly an element of time. Data Warehouse & Mining- Spring 2012
Nonvolatile • Data once recorded cannot be updated. • Data Warehouse requires two operations in data accessing • Initial loading of data • Access of data load access Data Warehouse & Mining- Spring 2012
Summary:What is a Data Warehouse? • It is a blend of many technologies, the basic concept being: • Take all data from different operational systems • If necessary, add relevant data from industry • Transform all data and bring into a uniform format • Integrate all data as a single entity • Store data in a format supporting easy access for decision support • Create performance enhancing indices • Implement performance enhancement joins • Run ad-hoc queries with slow selectivity Data Warehouse & Mining- Spring 2012
Benefits of Data Warehouse • High returns on investment. • Substantial competitive advantage. • Increased productivity of corporate decision-makers. • Fast reporting for decision making process. • Reduced reporting load on transactional systems. • Making institutional data more user-friendly and accessible for knowledge workers. • Integrated data from different source systems. • Enabled ‘point-in-time’ analysis and trending over time. • Helps in identifying and resolving data integrity issues, either in the warehouse itself or in the source systems that collect the data. Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different? 1. Decision making is Ad-Hoc Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different? 2. Different patterns of hardware utilization Bus Service vs. Train Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different? 3. Combines operational and historic data • Don’t do data entry into a DWH. OLTP or ERP are the source systems. • OLTP systems don’t keep history, cannot get balance statement more than a year old. • DWH keep historical data, even of bygone customers. Why? • In the context of bank, want to know why the customer left? • What are the events that led to his/her leaving? Why? • Customer retention Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different? How much history? • Depends on: • Industry • Cost of storing historical data • Economic value of historical data • Industry and history • Telecom calls are much much more as compared to bank transactions • 18 months • Retailers interested in analyzing yearly seasonal patterns • 65 weeks, why? • Insurance companies want to do actuary analysis, use the historical data in order to predict risk • 7 years Hence NOT a complete repository of data. Data Warehouse & Mining- Spring 2012
Data Warehouse: How is it Different? How much history? Economic value of data vs. storage cost Data Warehouse a complete repository of data? Data Warehouse & Mining- Spring 2012