160 likes | 303 Views
Adv. DBMS Data Warehouse CSC5301. Review Hachim Haddouti. Do You Remember?. DSS. Data cube. MD. OLTP. RollUp. drill down. Slice/dice. MD. ROLAP. Star schema. MOLAP. Data mining. Data extraction. Fact table. Data Warehouses.
E N D
Adv. DBMSData WarehouseCSC5301 Review Hachim Haddouti
Do You Remember? DSS Data cube MD OLTP RollUp drill down Slice/dice MD ROLAP Star schema MOLAP Data mining Data extraction Fact table
Data Warehouses • “Subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision-making process” Inmon (AP = analytical processing is missing) • Used for analysis of existing data • Resolves performance issues suffered by operational RDBMSs and OLTPs
Sizing DW? Mining of mobile phone calls: (Caller, Callee, Time, Duration, Geogr. Location) ~ 100 B/tuple In Germany 107 users * 10 calls/(day*user) * 100 B/call = = 1010 B/day ~ 3*1012 B/year = 3 TB/year Scanning data at 107 B/s takes 3*1012/107 = 3*105 s > 3 days
ER Model a disaster for querying a huge amount of data (time) not understandable for users and they can not be navigated usefully by DBMS software. hard to visualize; many possible connections between tables, To avoid redundancy MD Model better performance Better data organisation Better visualization Business queries (why, what if) Data model
Typical DWH Analyses/Queries • What are the consequences of new orders for production capacity w.r. to investment, personnel, maintenance, extra hours, ... • Seasonal adaptions, e.g. when to produce how many skis, bikinis, convertibles, ... • Influence of external financing on profits
Operations: • aggregation • slice • dice (cube) • rollup to coarser level • drill down to more detailed level • grouping • sorting
Steps to build a DWH • Acquisition of data • Data cleansing • Storage • Processing: AP • Maintenance, ... Not possible with classical DB-technology alone
On-Line Analytical Processing • OLTP (online transaction processing) for operational data of enterprise, e.g. in relational DBMS, IMS, SAP/R3, ... • DSS:Decision Support System to store data/information for strategic management decisions: aggregations, summaries, etc. • Optimized to work with data warehouses • Used to answer questions • Allows users to perceive data as a multidimensional data cube • Data mining
OLTP versus OLAP Thematic focus • OLTP: many small transactions (microscopic view of business processes, individual steps at lowest level, single order, delivery) • OLAP: finances in general, personnel in general, ... • OLAP requires integration and unification of many detailed data into big picture • Time orientation • Durability: data extracted once, no updates
Technical Comparison OLTP vs OLAP • OLTP: high rate of updates, several thousand t/s • OLAP: read only transactions, very complex, DWH is loaded at certain time intervals, e.g. after the end of the month, quarter • Compute intensive • Special systems with new access methods, e.g. multidimensional data organization and access methods • Special OLAP systems necessary to offload OLTP systems
ROLAP and MOLAP Solution 1: ROLAP relational online analytical processing, built on top of relational DBS, additional middleware or client front end (star schema) Solution 2: MOLAP: multidimensional online analytical processing • new model • new data organizations • new algorithms • new query languages • new optimization techniques
DW Review • degenerate dimension • big dimensions • hierarchies • snow falcking • Slowly changing dimensions • dirty dimensions • Hetegrogeneous prodcuts (core and custom) • Factless Fact table