240 likes | 494 Views
Data Warehouse Prerequisites. Familiarity with Microsoft SQL Server System Administration for Microsoft SQL Server 7.0 and Implementing a Database on Microsoft SQL Server 7.0 Knowledge of Transact-SQL Usage in Developing OLTP Systems
E N D
Data Warehouse Prerequisites • Familiarity with Microsoft SQL Server • System Administration for Microsoft SQL Server 7.0 and Implementing a Database on Microsoft SQL Server 7.0 • Knowledge of Transact-SQL Usage in Developing OLTP Systems • Basic Understanding of Programming Principles and Experience with a Scripting Language • Understanding of Basic Database Design, Administration, and Implementation Concepts
What is Data Warehousing? Peter Rawsthorne Special Thanks to Bill Inmon, the “grandfather” of data warehousing.
OLTP vs. DSS • Online Transaction Processing (OLTP) • Decision Support System (DSS) • OLTP • ATM, Bank Teller, Ticket Master, POS… • DSS • Marketing, What if?, Inventory, Health (Walmart)… • Click-through analysis
In-class Exercise • Break into teams of three • Think of a business or subject area • Determine three OLTP systems required to support business or subject • Determine two DSS systems required to support business or subject • Provide one example of how the DSS system could be used to predict the future
Exercise Example • Business: Yacht Club • OLTP • Membership System • Accounting System • Yacht Racing Results System • DSS • Quarterly and Yearly Expenses • Race Results • How much beer will we need for next years regatta for the male non-members who are crew on yachts over 40 feet?
What then is a data warehouse? A data warehouse is a: • subject oriented, • integrated, • time variant, • non volatile collection of data in support of management's decision making process.
Subject Orientation • Data is organized via subject rather than process or business function. • The application world is concerned both with data base design and process design. • The data warehouse world focuses on data modeling and database design exclusively.
Integration • Easily the most important aspect of the data warehouse environment is that data found within the data warehouse is integrated. ALWAYS. WITH NO EXCEPTIONS. • consistent naming conventions, • consistent measurement of variables, • consistent encoding structures, • consistent physical attributes of data, • and so forth.
OPERATIONAL Current valued data Time horizon: 60 – 90 days Key fields may or may not have an element of time Data can be updated DATA WAREHOUSE Snapshot data Time horizon: 5 – 10 days Keys do not have an element of time Once snapshot is made, records cannot be updated Time Variant
non volatile • Inserts, deletes, and changes - are done regularly to the operational environment on a record by record basis. • There are only two kinds of operations that occur in the data warehouse - the initial loading of data, and the access of data.
The structure of the warehouse The different components of the data warehouse are: • metadata, • current detail data, • old detail data, • lightly summarized data, and • highly summarized data.
Current Detail Data • Most recent happenings • Voluminous • Lowest level of granularity • Almost always stored on disk storage • Fast to access • Expensive and complex to manage
Older detail data • Stored on some form of mass storage • Infrequently accessed • Stored at a level of detail consistent with current detailed data • Often stored on an alternate storage medium • Anticipated large volume
Lightly summarized data • Distilled from the low level of detail • Almost always stored on disk • Design issues facing the data architect are; • what unit of time • what contents – attributes • Frequently mined data, a lot of “what if?”
Highly summarized data • Compact and easily accessible • Sometimes found in the data warehouse • Sometimes found outside the data warehouse • In any case, the highly summarized data is part of the data warehouse • Yearly or multi year summaries
Metadata • Sits in a different dimension • Contains no data directly taken from the operational environment • Special and very important role • Metadata is used as: • a directory to locate the contents • a guide to the mapping of data • a guide to the algorithms used for summarization
Flow of Data • data enters from the operational environment, it is transformed • data goes into the current detail level of detail • It resides there and is used there until one of three events occurs: • it is purged, • it is summarized, and/or • it is archived.
Summary • A data warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management's decision needs. • Four levels of data warehouse data: old detail, current detail, lightly summarized data, and highly summarized data. • Metadata is a very important part
Lab deliverables • W2KS Install • SQL7.0 Install • SQL7.0 OLAP Services Install • MSPress install • Complete MSPress Chapter 1
Contact Information Peter Rawsthorne, B.Tech, MCSD, MCT, CCR President, Eclectic Endeavours Inc. 559A Artisan Lane PO Box 281 Bowen Island, BC CANADA V0N 1G0 Phone: 604-947-2760 Fax: 604-947-2715 email: peterr@endeavours.com web: http://www.endeavours.com