420 likes | 772 Views
TEACHING THE DATA WAREHOUSE COURSE . Paul Gray ISECON 2001. Outline. Overview of what is Data Warehousing The 5 and 10 week courses The Indiana University of Pennsylvania course. ORIGINS. Data warehouses are the results of two software solutions needing and finding one another:
E N D
TEACHING THE DATA WAREHOUSE COURSE Paul Gray ISECON 2001
Outline • Overview of what is Data Warehousing • The 5 and 10 week courses • The Indiana University of Pennsylvania course
ORIGINS • Data warehouses are the results of two software solutions needing and finding one another: • Data base firms developed data warehouses and were looking for applications • EIS and DSS software developers and vendors needed to deal with ever-increasing data bases • About 10 years ago, the two groups started interacting with the results described here.
ORIGINS • Database developers long understood that their software was required for both transactional and analytic processing • However, their principal developments were directed to ever-larger transactional data bases. This process occurred even through operational and analytic data are separate with different requirements and different user communities.
ORIGINS • Once these differences were understood, new data bases were created specifically for analysis use. • Today, data warehouses have 3 major applications • On-line analytic processing for business intelligence • Data Mining • Customer Relationship Management
WHAT IS DATA WAREHOUSING • A data warehouse is typically a dedicated data base system for decision making that is separate from the production data base(s) used operationally. It differs from production system in that: • it covers a much longer time horizon than transaction systems • it includes multiple data bases that have been processed so that the warehouse’s data are defined uniformly (i.e., ‘clean’ data) • it is optimized for answering complex queries from managers and analysts.
WHAT IS DATA WAREHOUSING? • In the last 5 years, data warehousing has become a major industry within computing which has brought together the ideas of data bases and decision support. It has also been the foundation for efforts in data mining and in CRM • Data mining refers to finding answers about an organization from the information in the data warehouse that the executive or the analyst had not thought to ask. Data mining is made possible by the very presence of large databases in the data warehouse. It provides techniques that allow managers to obtain managerial information from their legacy systems. Its objective is to identify valid, novel, potentially useful, and understandable patterns in data.
WHAT IS DATA WAREHOUSING? • The objective of a data warehouse is to create a “single truth” • Data warehousing is a major new application area. It rates extremely high salaries (up to $100,000 for specialists, $300,000 for consultants).
DEFINITION • A data warehouse is a: • Subject oriented • Integrated • Time-variant • Non-volatile Collection of data in support of management decision processes
NOTE: • Data warehouse is physically separated from operational systems and operational data bases • Data warehouses hold both aggregated and detailed data for management separate from the databases used for On-Line Transaction Processing (OLTP)
SUBJECT ORIENTATION • Data is organized around major subjects of the enterprise • Example: • OPERATIONAL DATA WAREHOUSE • Loans Customer • Savings Vendor • Bank card Product • Trust Activity • An application A subject • orientation orientation
USING THE WAREHOUSE • The higher the level of summarization, the more the data is used • The more summarized the data, the quicker it is to retrieve • However, the higher the level of summarization, the lower the level of detail
DATA MARTS • Data Mart: A scaled-down version of the data warehouse • A data mart is a small warehouse designed for the SBU or department level. • It is often a way to gain entry and provide an opportunity to learn • Major problem: if they differ from department to department, they can be difficult to integrate enterprise-wide
COST • Data warehouses are not cheap • Median cost to create (does not include operating cost) = $2.2M • Multimillion dollar costs are common • Their design and implementation is still an art and they require considerable time to create
SIZE • Being designed for the enterprise so that everyone has a common data set, they are large and increase in size with time. • Typical storage sizes run from 50 Gigabytes to several Terabytes
SIZE OF INDUSTRY • Data warehouses are a major industry within information systems. 6B$/year • Estimates vary but it is clear that many more than 90% of Fortune 1000 have data warehouse projects • Major players include: • Oracle IBM+Informix Sybase NCR • + BI companies (Brio, Cognos, Pilot,….)
MARKET COMPONENTS • High end business intelligence (OLAP) • Low end query tools • Data cleansing • Data marts • Data mining • Customer relationship management (CRM)
APPLICATION – OLAP/BI • OLAP = On Line analytic processing • Basic idea of OLAP: managers should be able to manipulate enterprise data models across many dimensions to understand changes that are occurring • Vendors claim they are OLAP compliant even if they are not
APPLICATION - DATA MINING • Also known as Knowledge Data Discovery (KDD) • Mining terminology refers to finding answers about a business from the data warehouse that the executive or analyst had not thought to ask
APPLICATION - DATA MINING • KDD applies techniques mostly from artificial intelligence and statistics to discover new information. • It is designed to find information that queries and reports don’t reveal effectively • KDD uses AI and statistics to find pattern in data and to infer rules.
APPLICATION - DATA MINING • Some successes: • People who buy scuba gear take Australian vacations • Fraud detection, consumer loan analysis • Optimizing production lines • IBM’s SCOUT • (apocryphal?) Men who buy diapers buy beer
CUSTOMER RELATIONSHIP MANAGEMENT (CRM) • Successor to data base marketing • Implies marketing to customers on a 1:1 basis • Requires data granularity at the level of the individual customer large amounts of data • Data warehouse is only a part of the CRM concept.
History of Course I • First given as PhD Seminar on Management of Information Systems 1996 • Overview lecture by instructor • Students present topics each week – typically 2 or 3 • Great source of material!
History of Course II • Attend Data Warehouse Institute conference in San Diego. Agree to write book with Watson. January 1997 • Complete book in late 1997. Desk-top published. Used sabbatical at UCI to do the writing. • Move course to regular MS course in Spring 1997.
Two Versions: • 14 week for Claremont Graduate University • 5 Week for UC Irvine • Taught 1/year for last 5 years • WHY DO STUDENTS TAKE COURSE? • High salaries (100K specialist, $300K for consultant)
Both Versions • Text: Decision Support in the Data Warehouse by P. Gray and H.J. Watson Prentice Hall 1998 • PLUS readings • PLUS hands-on exercises • PLUS term paper
5 WEEK VERSION Wk Part 1 Part 2 • Overview of DW OLAP, applications • Demo Startracker Strategic use, software Framework • Data sources, Planning and cleansing,metadata operating the DW • Data marts,ODS Industry, economics • Mining, BI CRM, Developments
Using Software—Star Tracker • Simple DW • The Data Warehouse Toolkit : Practical Techniques for Building Dimensional Data Warehouses by Ralph Kimball (Wiley 1996) • Commercial version “Synchrony” from If… no longer available
Using Software – Commercial • Red Brick (now part of Informix which is now owned by IBM) • Gift from Red Brick • Mostly used with student projects
Data Sources • Journal of Data Warehousing • DM Review • On-line (dmreview.com) • Hard copy • Lots and lots on Internet e.g., • www.Dw-institute.com • Pwp.starnetinc.com/larryg • www.datawarehouse.org
Assignments • Find 10 articles on Internet • teaches search techniques, nomenclature, rapid course immersion • Exercise with software • The DW industry
Term Projects for DW • Next chart shows 22 of the 30 suggested topics. Last topic allows student to choose topic subject to instructor approval.
DW Architecture Data sources, loading, cleaning, summarizing, granularity Metadata in a DW Data Modeling and the DW Multiple Dimensions in DWs via ROLAP and MOLAP Indexing in DWs DW Interfaces –conventional, window, browser Design principles for DWs Strategic use of the DW Justification of the DW including C/B analysis DW Security Data Marts Operational Data Stores Maintaining the DW Economics of DWs Querying and reporting in a DW OLAP and the DW Decision Support Systems and the DW Database marketing and the DW Data Mining and the DW Personnel considerations for building the DW and for maintaining the DW Organizing the DW Project (Project management etc.)
Elizabeth Pierce’s Course at Indiana University of Penn. • Paper published in Communications of AIS (CAIS) Sept. 1999 (Vol. 2 Article 16) • Developing and Delivering a Data Warehousing and Mining Course • “Introduces students to the strategies, technologies, and techniques”
Elizabeth Pierce’s Course at Indiana University of Penn. • “Students learn what is involved in planning, designing, building, using, and managing a data warehouse. Students also learn about how a data warehouse must fit into an over-all corporate data architecture that may include legacy systems, operational data stores, enterprise data warehouses, and data marts. In addition, students are exposed to the different data mining techniques used by organizations to derive information from the data warehouse for strategic and long-term business decision making.” (abstract)
U. Of Indiana Course • 14 weeks –7 weeks on DW • Challenges: • Evolving nature of the subject • Hands on experience for students • Textbook selection—mostly tradebooks
Useful Tables in Pierce • List of Internet sources • Available software demos • Some on-line data warehouses with public access • Schools that offered data mining courses in 1999
Conclusions • Students come because it is a job source • A good 2nd course for data base (can be taken w/o database course) • Allows covering the new topics of the 90’s and 00’s—OLAP,CRM,Mining, in more depth than Intro to IS course. • It’s a fun topic and you can even do research on it.