250 likes | 258 Views
This guide provides an overview of data warehousing, including its definition, characteristics, motivations, architecture, and design approaches.
E N D
Data Warehouse IMS5024 – presented by Eder Tsang
Data Warehouse • A data warehouse is a system consisting of processes and databases used to provide the “data infrastructure” for EIS and DSS • “… a subject-oriented, integrated, timevariant, and non-volatile collection of data in support of management’s decisions” Inmon and Hackathorn (1994)
Data warehouse - subject oriented • The data warehouse is organised by “data subjects” that are relevant to the organisation. – Customer, claim, shipment, product • This may be contrasted with the process orientation of many OLTP systems
Data warehouse - integrated • Data in the warehouse is structured based on a corporate-wide model, spanning the functional boundaries of legacy systems • This includes naming standards, units of measurement and periodicity
Data warehouse - time variant • Data is the data warehouse is characterised by the time-series nature of historical data • The data consists of a series of “snapshots” which are time-stamped and record values at a moment in time • This supports trend analysis of the data
Data warehouse - non volatile • The data warehouse is not continuously updated (inserts, eletes and changes) like data in an OLTP system • Data in a data warehouse is periodically up-loaded at a scheduled time intervals (say daily)
Motivations for data warehousing • Demands on OLTP data bases for query processing would be too great • Data warehousing is designed for efficient retrieval • Data in legacy systems is frequently inconsistent, of poor quality and stored in different formats • Reduce costs in providing data for decision making
Motivations for data warehousing • Support for focus on complete business processes (BPR) • Support for new initiatives – CRM, Balanced Scorecard • Industry sources quote ROI’s averaging 401% over 3 years • Remain competitive
Data warehouse development • Requirements identification • Logical design, data modelling • Data extract, transform and load (ETL) • Warehouse architecture , technology and tools • Physical database design • Delivery systems • Operational policies
Designing a data warehouse – data design There are two main approaches to data modelling or data warehouse design – entity relationship modelling and normalisation – dimensional modelling
The design of databases using a traditional E-R approach • Entities and relationships • Normalisation 3NF
Why do we normalise data? • Normalisation is a process for converting complex data structures into simple, stable data structures • Normalisation protects integrity of database by avoiding anomalies (update, delete, create) • Normalised data models are: • robust and stable • have minimum redundancy
Dimensional Modeling (star schema) Components of dimensional model: – Fact Tables : contain measurements of business eg. Sales, purchase order, shipment – Dimension Tables : store the descriptions of the dimensions of the business eg. Product, customer, vendor, store
Dimensional Modeling (star schema) • Each dimension table has a single primary key that corresponds exactly to one of the components of the multipart key in the fact table. • A fact table always expresses a many to many relationship (the key is composed of foreign keys • The most useful facts in a fact table are numeric and additive ( typically values are added up)
Snowflake schema • Snowflake schema –all the tables are normalised • Star schemas are preferable to snowflake – fewer joins for information retrieval
Dimensional Modelling vs E-R modelling • the purpose of dimensional modelling structure data for easy and efficient analysis • E-R modelling creates a single required to support organisation’s Whereas • DM creates individual models for business/decision interest eg. • model for sales info • model for Inventory info
Dimensional Modelling vs E-R modelling (Con’t) • OLTP and DW have different purpose: – operational vs informational • Normalisation protects integrity of database by avoiding anomalies (update, delete, create) • Data models for data warehouse do not have to be normalised – In contrast, data in DW does not change often – periodic additions of new data
DM vs. E-R modeling debate (Kimball’s view) • OLTP systems are volatile – high rates of update transactions • In normalised models the goal is to reduce data redundancy and prevent update anomalies • Data in a data warehouse does not need to be normalised because it is periodically refreshed not updated by user transactions