640 likes | 749 Views
From Transaction Processing to Support for Decision Making. CIS 671. Computerized Information Systems. Used to “run the business”. OSU Examples Personnel & Payroll (ARMS) Course Offerings Students, including course enrollments and grades (estimated $30M to replace) Inventory
E N D
From Transaction ProcessingtoSupport for Decision Making CIS 671 Decision Support Systems
Computerized Information Systems • Used to “run the business”. • OSU Examples • Personnel & Payroll (ARMS) • Course Offerings • Students, including course enrollments and grades • (estimated $30M to replace) • Inventory • Transaction Processing Decision Support Systems
1st Generation DBMS • Designed for Transaction Processing • Hierarchical – IBM – IMS • Network • Management Information Systems • Added later • Mostly standard summary reports • Produced on a regular basis Decision Support Systems
Relational DBMS • Codd – particularly designed for “ad hoc” queries • First uses for Transaction Processing • Transaction Data now available on-line • Use it to help Decision Making • Ad Hoc Decision Support Systems
Decision Support Systems (DSS) • Use comprehensive view of all aspects of business. • Different business units • Historical data • Summary information • Classes of analysis tools: • Complex “traditional” SQL queries • Many “group-by” and “aggregation” queries (On Line Analytical Processing) • Exploratory data analysis - Data Mining Decision Support Systems
Data Warehousing • Properties • Consolidated data from many sources • Spanning long time periods • Augmented with summary information • Size: several gigabytes to terabytes Decision Support Systems
Data Warehouse Creation • Integrate schemas from different groups • Semantic mismatches • Different currencies • Different names for same attributes • Different structures for similar tables Decision Support Systems
Data Warehouse Creation, cont. • Extract data from different operational databases and other external sources • Clean data - correct errors, fill in missing data • Transform data to match integrated schema • Load data into warehouse • Refresh data in a timely fashion • Purge very old data • Create metadata repository • May be so large that it is in a separate database Decision Support Systems
Data Warehouse - Provide Variety of Analytical Tools • Complex “traditional” SQL queries • OLAP query engine • Data mining algorithm • Information visualization tools • Statistical packages • Report generators Decision Support Systems
Data Mart • Departmental subset of a data warehouse • Top-down approach • Derive from the organization’s data warehouse • May be too hard to do all at once • Bottom-up approach • Initially create departmental data marts • Integrate data marts into organizational data warehouse • If not done carefully, may be hard to integrate Decision Support Systems
OLTP Transaction oriented Thousands of users Small (MB to several GB) Current data Normalized data (many tables, few columns per table) Continuous update Simple to complex queries Data Warehouse Subject oriented Few users ( 100) Large (hundreds of GB to several TB) Historical data Denormalized data (few tables, many columns per table) Batch updates Usually very complex queries OLTP vs. Data Warehouse DBs(from Toby J. Teorey, Database Modeling & Design, Morgan Kaufmann, 1999, p. 212) Decision Support Systems
Complex “traditional” SQL queries • Relational DBMS optimized for decision support • in contrast to a DBMS optimized for transaction processing • Example: • Teradata machine from NCR Decision Support Systems
On Line Analytical Processing (OLAP)Multidimensional Databases (MDD) Decision Support Systems
Example from Finkelstein [Fink95]: • Note that Branch, ProdID, Date Sales, Returns • Note the multidimensionality of the SALES_INFO table. Decision Support Systems
Dimension Hierarchies LOCATION Territory Region Branch TIME Year Quarter Week Month Date PRODUCT Category ProdID Decision Support Systems
Possible queries: 1. How did product Widget sell in the last month, and how does this figure compare with sales over the last five years? How about by branch, region and territory? 2. Did this product sell better in different regions, and are there any regional trends? 3. Were there more returns of Widgets over the last year? Were these returns caused by defects? Were they manufactured in any particular plants? Decision Support Systems
Additional Possible query: 4. Do commissions and pricing affect how sales persons sell the product? Do particular salespersons do a better job of selling the product? Note that a "multidimensional" spreadsheet would be useful. Codd called this type of problem On Line Analytical Processing (OLAP) in contrast to On Line Transaction Processing (TP). Decision Support Systems
Codd's rules for OLAP: [Codd93] 1. Multi-Dimensional Concept View The user should be able to see the data as being multidimensional insofar as it should be easy to 'pivot' or 'slice and dice’. (See later.) 2. Transparency The OLAP functionality should be provided behind the user's existing software without adversely affecting the functionality of the 'host'. 3. Accessibility OLAP should allow the user to access diverse data stores but see the data within a common 'schema' provided by the OLAP tool. Decision Support Systems
OLAP Rules, cont. 4. Consistent Reporting Performance There should not be significant degradation in performance with large numbers of dimensions or large quantities of data. 5. Client-Server Architecture Since much of the data is on mainframes, and the users work on PCs, the OLAP tool must be able to bring the two together! 6. Generic Dimensionality Data dimensions must all be treated equally. Functions available for one dimension must be available for others. Decision Support Systems
OLAP Rules, cont. 7. Dynamic Sparse Matrix Handling The OLAP tool should be able to work out for itself the most efficient way to store sparse matrix data. 8. Multi User Support This is self-evident. 9. Unrestricted Cross-Dimensional Operations e.g., individual office overheads are allocated according to total corporate overheads divided in proportion to individual office sales. Decision Support Systems
OLAP Rules, cont. 10. Intuitive Data Manipulation Navigation should be done by operations on individual cells rather than menus. 11. Flexible Reporting Row and column headings must be capable of more than one dimension each, and of displaying subsets of any dimension. 12. Unlimited Dimensions and Aggregation Levels At least 15 dimensions may be required, and within each there may be many hierarchical levels. Decision Support Systems
Example from Finkelstein [Fink95]: • Note that Branch, ProdID, Date Sales, Returns • Note the multidimensionality of the SALES_INFO table. Decision Support Systems
“Pivoting”Cross TabulationSales by Date and Region Decision Support Systems
“Drill Down”(narrower category)Replace Region by Branch. “Rollup” (more general category) Replace Region by Territory. Decision Support Systems
OLAP Questions 1. Query language - how to say what's wanted. 2. Processing language - how to specify calculations: ratios, variances, . . . . 3. Data visualization - how to see the data. 4. Performance - time to process the query (5 second rule). Decision Support Systems
OLAP References [Codd93] E. F. Codd, S. B. Codd, and C.T. Salley, "Providing OLAP to User Analysts: An IT Mandate," Codd & Date Inc., 1993. [Fink95] Richard Finkelstein, "MDD: Database Reaches the Next Dimension," DATABASE Programming and Design, 8(4), April 1995. Decision Support Systems
Exploratory Data Analysis Data Mining • Find interesting trends or patterns in large data sets. • Statistics - Exploratory Data Analysis • Artificial Intelligence - Knowledge Discovery and Machine Learning • Much larger data sets Decision Support Systems
Mining for Association Rules • Classic example • Market basket analysis • Record each customer transaction at a grocery store. • Try and identify sets of items purchased together. Decision Support Systems
Association Rule: {coke} {chips} People who buy coke usually buy chips. • Measures for Association Rule • {LHS} {RHS} • Support: % of transactions containing this set of items. (2/5=40%) • Confidence: given all transactions containing LHS items, the % that also contain the RHS (2/3=67%) • Want both to be “reasonably” large. Decision Support Systems
On-Line Analytical Processing (OLAP)Part II: CIS 671 Elmasri & Navathe §26.1 Decision Support Systems
Multi-dimensional View of Data • Fact Table (also called cubes) • Dimension attributes • Dependent attributes (functions of the dimension attributes) • Dimension Tables, potentially one for each dimension Decision Support Systems
OLAP Operations • Roll-up – increase the level of aggregation • Drill-down - decrease the level of aggregation • Slice-and-dice - selection and projection, i.e., reduce dimensionality of the data • Pivot – re-orient the dimensional view Decision Support Systems
Implementation Approaches • Relational OLAP (ROLAP) Servers • Data stored in a relational • system • SQL extended • To allow easy OLAP query expression • To provide efficient OLAP query execution. • Multidimensional OLAP (MOLAP) • Systems directly store multidimensional data in special data structures • OLAP operations implemented directly on these data structures. • Hybrid OLAP (HOLAP) • Combines ROLAP and MOLAP. • Detail records (largest volume) in relational database. • Aggregations in separate, but connected”, MOLAP store. Decision Support Systems
Product Example a Star Schema ProdNo ProdName ProdDescr Category CategoryDescr UnitPrice QOH Order OrderNo OrderDate Sales (Fact) table OrderNo SalespersonID CustomerNo ProdNo DateKey CityName Quantity TotalPrice Customer CustomerNo CustomerName CustomerAddress City Date DateKey Date Month Year Salesperson SalespersonID SalespersonName City Quota City CityName State Region Decision Support Systems
Product Snowflake Schema Category ProdNo ProdName ProdDescr Category UnitPrice QOH Order CategoryName CategoryDescr OrderNo OrderDate Sales (Fact) table Customer OrderNo SalespersonID CustomerNo ProdNo DateKey CityName Quantity TotalPrice CustomerNo CustomerName CustomerAddress City Date Year Month DateKey Date Month Year Month Year Salesperson SalespersonID SalespersonName City Quota Region City State CityName State Region State Region Decision Support Systems
Data Cubes • Precompute all possible aggregations. • Required extra storage is tolerable. • Little penalty to keep aggregate up-to-date if data does not change. • Normally some aggregation of raw data is done before it is entered into the data cube. Decision Support Systems
Product Data Cube with Orders Accumulated Category ProdNo ProdName ProdDescr Category UnitPrice QOH CategoryName CategoryDescr Sales table Customer CustomerNo CustomerName CustomerAddress City SalespersonID CustomerNo ProdNo DateKey CityName Quantity TotalValue Date Year Month DateKey Date Month Salesperson Month Year SalespersonID SalespersonName City Quota Region City State CityName State Note that average for any aggregate can be calculated from TotalValue and Quantity. State Region Decision Support Systems
Sample of Aggregates in the CUBE Sales (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 11 100 2 ‘Columbus’ 3 300 CUBE(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 11 100 2 ‘Columbus’ 3 300 22 * 100 2 ‘Columbus’ 6 2222 22 * * 2 ‘Columbus’ 25 33000 * * * 2 ‘Columbus’ 75 90000 * * * * ‘Columbus’ 200 503444 Decision Support Systems
How to answer query given the relation CUBE(Sales) Choose tuples in CUBE(Sales) with the following properties: • Query specifies valuev for attribute a • tuple t has v in its component for a. • Query groupsby attribute a • tuple t has any non-* value in its component for a. • Query has neithergroups by attribute a nor specifies value for a • tuple t has * value in its component for a. Decision Support Systems
How to answer query given the relation CUBE(Sales) Cube(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 11 100 2 ‘Columbus’ 3 300 22 * 100 2 ‘Columbus’ 6 2222 22 * * 2 ‘Columbus’ 25 33000 * * * 2 ‘Columbus’ 75 90000 * * * * ‘Columbus’ 200 503444 select CustomerNo, avg(Price) from Sales where SalespersonID = 22 Group by CustomerNo Result(c, v/n) Cube(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) 22 c * * * nv Decision Support Systems
Cube Implementation by Materialized Views • Dimensions may have hierarchies. • Product, Category • City, State, Region Decision Support Systems
Example: Materialized Views Cube(Sales) (SalespersonID, CustomerNo, ProdNo, DateKey, CityName, Quantity, TotalValue) City (CityName, State, Region) insert into SalesV1 select SalespersonID, CustomerNo, Month, State sum(Quantity) as Quantity, sum(TotalValue) as TotalValue from Sales join City on Sales.CityName = City.CityName group by SalespersonID, CustomerNo, Month, State; insert into SalesV2 select SalespersonID, CustomerNo, Month, Region sum(Quantity) as Quantity, sum(TotalValue) as TotalValue from Sales join City on Sales.CityName = City.CityName group by SalespersonID, CustomerNo, Month, Region; Decision Support Systems
Example: Query 1 select SalespersonID, sum(TotalValue) from Sales group by SalespersonID; Answer by select SalespersonID, sum(TotalValue) from SalesV1 group by SalespersonID; or by select SalespersonID, sum(TotalValue) from SalesV2 group by SalespersonID; Decision Support Systems
Example: Query 2 select SalespersonID, State, sum(TotalValue) from Sales group by SalespersonID, State; Answer only by select SalespersonID, State, sum(TotalValue) from SalesV1 group by SalespersonID, State; Decision Support Systems
Example: Query 3 select SalespersonID, State, date, sum(TotalValue) from Sales group by SalespersonID, State, Date; Cannot be answered by either SalesV1 or SalesV2. Thus must use Sales itself. Decision Support Systems
Lattice of Views All All Years Region Quarters State Weeks Months Days City Decision Support Systems
Lattice of Materialized Views and Queries Q1 Q2 Q3 SalesV1 SalesV2 Sales Decision Support Systems
OLAP ExampleGarcia-Molina, Ullman & Widom, Database System Implementation, Prentice Hall, 2000 • Automobile Sales Company: analyze sales of cars • Sales(serialNo, date, dealer, price) • Autos(serialNo, model, color) • Dealers(name, city, state) • Days(day, week, month, year) • ( 5, 27, 7, 2000) Fact Table Dimension Tables Time Dimension Table, probably not stored Decision Support Systems
Assume a particular car model, say ‘Gobi’, is not selling as well as anticipated.How to analyze? • Maybe it’s the color. Slice for ‘Gobi. Dice for color. • select color, sum(price) • from Sales natural join Autos • where model = ‘Gobi’ • group by color; • Doesn’t show anything interesting. Decision Support Systems
Gobi analysis, continuing • What about time? • Drill downfor month. • select color, month, sum(price) • from Sales natural join Autos • join Days on date = day • where model = ‘Gobi’ • group by color, month; • Suppose we discover red Gobis have not sold well recently. Decision Support Systems