250 likes | 881 Views
Databases Management Systems. Data Warehousing Lectures based on material from Phil Trinder (HW) Monica Farrow G30 email : monica@macs.hw.ac.uk. Data Warehouse. A data warehouse is a specialised database to support strategic decision making Decision making Analyse the problem, e.g.
E N D
Databases Management Systems Data Warehousing Lectures based on material from Phil Trinder (HW) Monica Farrow G30 email : monica@macs.hw.ac.uk DBIS:Data Warehousing
DBIS:Data Warehousing Data Warehouse • A data warehouse is a specialised database to support strategic decision making • Decision making • Analyse the problem, e.g. • Why are my sales not meeting my targets? • What products are not meeting their targets? • What are the trends for the failing products? • Generate alternative solutions, evaluate them, and choose the best
Decision Support Systems • These are used by management to make strategic or policy decisions • They have existed for a long time • Characteristics • Aimed at loosely specified problems • Combine models and analytical approaches with data retrieval • Good usability for non-specialist use • Flexible: to support multiple decision-making approaches DBIS:Data Warehousing
A wine club example • 100,000 members, 2000 wines, 150 suppliers, 750,000 orders per year • Systems : storage technology • Member administration : indexed sequential files • Stock control: relational database • Order processing: relational database • Despatch: proprietary database DBIS:Data Warehousing
Wine Club Operational Schema Member places supplies Supplier Wine MemberOrder in On OrderItem Stock Is for DBIS:Data Warehousing
Wine Club Situation • Competitors have moved in. Is our market share falling? • What products are increasing/decreasing in popularity? • Which products are seasonal? • Which members place regular orders? • Are some products more popular in certain parts of the country? • Which members concentrate on particular products? DBIS:Data Warehousing
Strategic vs Operational Issues • Strategic: planning and policy making, long term and broad brush, higher levels of management, e.g. • When to launch a new product? • Should a supermarket open on a Sunday? • Operational: day-to-day running of business. Details and immediate, lower levels of management • Which items are out of stock? • What is the status of order 34522? DBIS:Data Warehousing
Motivation for data warehousing • Operational data is not suitable to guide strategic decisions • Need to examine trends • What is happening over time? • Solution: record sales daily and analyse them • This is the start of a warehouse • Operational data is usually archived periodically DBIS:Data Warehousing
Data warehouse characteristics • Subject-oriented e.g. sales • Non-volatile e.g. once data is stored, it isn’t changed • Integrated, data from multiple (operational) sources is accumulated in an integrated format • Time variant: data is recorded against time to allow trend analysis • Some data duplication • Less joins, more indexes • More data, derived data and attributes DBIS:Data Warehousing
Data Warehouse design: dimensional analysis • Used to identify the requirements of the warehouse • What are the aspects of the data that are strategically important e.g. • Member • Product • Time always • We don’t know in advance exactly what the queries will be DBIS:Data Warehousing
Smith Jones Bloggs MEMBER Q1 2008 Q4 2007 TIME Q3 2007 Macon Chablis Merlot Chardonnay PRODUCT 3D analysis DBIS:Data Warehousing
Star Schema SALES Centralfacttable Wine Area Time Member DBIS:Data Warehousing
Warehouse construction Source1 Extraction Presentation Source n Aggregate Navigators Integration DBMS DBIS:Data Warehousing
Extraction • Retrieve data from all data sources: files, databases etc • Must be add-on to existing operational system. For example, • Day-end extraction run • Trigger to record sale DBIS:Data Warehousing
Integration • Format Integration, similar to type mismatch • Examples: address or gender • ‘male’, ‘female’ • ‘M’, ‘F’ • 0 and 1 • Semantic integration: does a datum mean the same in the data being integrated? • Example – a sale • order processing: order received • stock control: extracted items from physical warehouse • despatch: goods shipped DBIS:Data Warehousing
Star Schema Wine winecode, winename, vintage, description, price Area areacode, description SALES Centralfacttable winecode, membercode, areacode, quantity, itemcost Time timecode, date, periodno, quarterno, year Member membercode, membername, memberAddress DBIS:Data Warehousing
Warehouse Database • Centre of star schema becomes a relation: the fact table – numeric facts and foreign keys • Sales(membercode, winecode, timecode, areacode, qty, itemcost) • Each dimension becomes a relation: a dimension table • Member(membercode, membername, memberaddress) • Wine(winecode, name, vintage, description, price) • Area(areacode, description) • There is ALWAYS a time dimension table • Time(timecode, date, periodno, quarterno, year) DBIS:Data Warehousing
Using the Warehouse • The strategic questions can now be investigated using data extracted by SQL queries • For example, to discover which wines have increasing and decreasing sales, we can retrieve a table giving the total sales for each wine against time: • SELECT w.winename, t.period_number,SUM(s.qty)FROM sales s, wine w, time tWHERE s.winecode = w.winecode AND s.timecode = t.timecodeGROUP BY w.winename, t.periodnoORDER BY w.winename, t.periodno DBIS:Data Warehousing
Using the Warehouse • The resulting table might be visualised as a graph, with a different coloured line for each product • Usually involves a lot of indexes • Bitmap indexes are widely used • May use materialised views • Commonly used sums, counts, joins DBIS:Data Warehousing
Aggregation • Data warehouses grow continuously, and may become very large indeed • Problems: the time to compute a query and the size of the result can be very large indeed • Solution: aggregation DBIS:Data Warehousing
Aggregates • Basic idea: sacrifice detail to reduce the size of the data • Store precomputed tables at a useful level of detail. Similar to stored joins. • Must be carefully selected, e.g. • Sales to each member of each wine summer for each quarter • Sales of each wine summed for each quarter for each month • Sales of each wine summed for each area for each quarter • Levels of aggregation • None(i.e. detail) • Light (e.g. monthly) • Highly (e.g. quarterly) DBIS:Data Warehousing
Aggregate navigator • An aggregate navigator uses information about available aggregates to automatically rewrite queries to use them • It also records aggregates usage, so that unnecessary aggregates can be removed • It can suggest useful new aggregates DBIS:Data Warehousing
Presentation requirements • Must be easy to use • Visualise the results of queries in many ways e.g. charts, graphs, scatter diagrams etc • Make good use of colour and dimensions 2D, 2.5D, 3D, animation Example of 2.5D graph • Have analysis tools: statistical and curve fitting • For example the product sales trend table would be plotted as a graph DBIS:Data Warehousing
Discussion/Exercise • A company sells garden trees from several store locations around the country. People visit the store, and buy trees. The names of the customers are always recorded, and many customers place repeat orders. • The company would like to set up a data warehouse so that they can analyse details such as • Frequency of sales per customer • Which location has the best sales, ranked by season • Top selling tree by month • Etc etc • Create a suitable star schema, inventing appropriate attributes DBIS:Data Warehousing
Answer This is a general answer – replace product by trees etc Anything sensible along these lines is ok DBIS:Data Warehousing