1 / 17

Extending Practical Pre-Aggregation for On-Line Analytical Processing

Extending Practical Pre-Aggregation for On-Line Analytical Processing. T. B. Pedersen 1,2 , C. S. Jensen 2 , and C. E. Dyreson 2 1 Center for Health Information Services Kommunedata, www.kmd.dk 2 Nykredit Center for Database Research Department of Computer Science

mabyn
Download Presentation

Extending Practical Pre-Aggregation for On-Line Analytical Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen1,2, C. S. Jensen2, and C. E. Dyreson2 1 Center for Health Information Services Kommunedata, www.kmd.dk 2 Nykredit Center for Database Research Department of Computer Science Aalborg University, www.cs.auc.dk/NDB 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  2. Motivation • On-Line Analytical Processing (OLAP) systems are popular. • Based on a dimensional view of data. • Require fast query response times even for large amounts of data. • Pre-aggregation is used to speed up queries. • Full pre-aggregation (all combinations of aggregate levels are materialized) is infeasible. • Takes up too much space (~200 times the size of the raw data). • Takes too long to update when data changes. • Practical (or partial) pre-aggregation techniques • Store only select combinations of aggregates. • Re-use these to compute higher-level results. • Provides a good trade-off between storage space/update time and response time. 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  3. Motivation • Practical pre-aggregation requires well-behaved hierarchies and fact-dimension relationships. • Hierarchies must be • strict • covering • onto • Facts and dimensions must be • many-to-one • have uniform granularity. • Often, these requirements are not satisfied in practice. • We present techniques and algorithms for applying practical pre-aggregation to general, non-summarizable hierarchies and fact-dimension relationships. 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  4. Patient Case Study • Patients are the facts in multidimensional terms. • Each patient has zero or more diagnoses (many-to-many). • The diagnoses may be registered at any level (Low-level, Family, or Group), i.e., the granularity is varying. • A diagnosis may have no children (non-onto) or several parents (non-strict). • Each patient has one address, in a city or a rural area (non-covering). Cities are located in counties. 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  5. Talk Overview • Motivation • Data model context • Hierarchy properties • Hierarchy transformations • Integration in current systems • Conclusion and future work 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  6. Data Model - Schema • Fact type: Patient • Dimension types: Diagnosis and Residence • There are no “measures”, all data are dimensions. • Category types: Low-level Diagnosis, Diagnosis Family, Diagnosis Group, Address, City, County • Top category types: corresponds to ALL of the dimension. • Bottom category types: the lowest level in each dimension • The category types of a dimension type form a lattice. Residence Dimension Type Diagnosis Dimension Type TDiagnosis TResidence Diagnosis Group County Diagnosis Family City LL Diagnosis Address Patient 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  7. Data Model - Instances • Categories: instances of category types, consist of dimension values. • Top categories contain only one “T” value. • Dimensions = categories + partial order on category values, hierarchy may be non-strict. • Facts: instances of fact type with separate identity. • Fact-dimension relations: links facts to dimensions, may map to several values of any granularity in each dimension. • Multidimensional object (MO) = schema + dimension + facts + fact-dimension relations Residence Dimension Diagnosis Dimension T T Cancer PregRel Diabetes Sydney Outback Lung Canc. Ins. Dep. Diab. Diab. Preg. Sydney Ins. Dep. Diab. Preg. 1 Central 2 Rural John Jim Jane 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  8. Hierarchy Properties • Summarizability requires that the dimension hierarchies are covering, onto, and strict. • Non-covering hierarchies occur when links between dimension values “skip” levels. • Non-stricthierarchies occur when one lower-level item has several parents. • Non-onto (into) hierarchies occur when the height of the hierarchy is varying. • These properties cause problems when re-using stored counts of patients to compute new values. Residence Dimension Diagnosis Dimension T T Cancer PregRel Diabetes Sydney Outback 1 0 2? ? 1 Lung Canc. Ins. Dep. Diab. Diab. Preg. Sydney 1 1 1 0 ? Ins. Dep. Diab. Preg. 1 Central 2 Rural 1 No Low-level Diagnosis 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  9. Hierarchy Transformations • The overall task is to transform non-summarizable hierarchies to summarizable hierarchies automatically. • A hierarchy is transformed in three steps: • 1) Transform the hierarchy to be covering. • 2) Transform the result from 1) to be onto. • 3) Transform the result from 2) to be strict. • We give an algorithm for each transformation. Each algorithm assumes that the previous algorithm(s) has been applied. 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  10. Non-covering Hierarchies • The MakeCovering algorithm starts with the bottom category C(Address). • For each parent category P of C (City and County) it looks for parent categories H that are “higher than” P (City < County). • The algorithms finds all the links L=(h,c) from H to C that are not covered by going through P. • For each link L an intermediate value is inserted into P and linked to h and c. • Finally, the algorithm is applied recursively to P (nothing changes in this case). Residence Dimension TResidence T Sydney Outback County H L Sydney CityOutback City P Address C 1 Central 2 Rural 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  11. Non-onto Hierarchies • The MakeOnto algorithm starts with the top category P(Tdiagnosis). • For each child category C of P it finds the values n in P with no children (none are found in the first two calls). • For each childless n in P, the algorithms inserts a placeholder value cn in C and links it to n. • Finally, the algorithm is called recursively on C. Diagnosis Dimension TDiagnosis T P Cancer PregRel Diabetes Diag. Group P C Lung Canc. Ins. Dep. Diab. Diab. Preg. P Diag. Family C n Ins. Dep. Diab. Preg. LL Diagnosis C LLLC 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  12. Non-strict Hierarchies • Schema changes shown first • Idea: “fuse” sets into one value. • The MakeStrict algorithm starts with a child category C. • For each parent category P of C, it looks for non-strictness between C and PifPhas parents. • If non-strictness occurs, a new category N (holding setsof P) is inserted between C and P. • Grandparents G of C are linked to the new category. • “Unsafe” links are removed. • Finally, the algorithm is called recursively on N. TDiagnosis G G G Diag. Group P Set-of-DG P P Diag. Family Set-of-DF C LL Diagnosis C C 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  13. Non-strict Hierarchies • Instance changes shown now. • The MakeStrict algorithm starts with a child category C. • For each parent category P of C, it looks for non-strictness between C and PifPhas parents. • If non-strictness occurs, new “fused” values (setsof P) are inserted into N. • Values in grandparents G of C are linked to the new values. • “Unsafe” links are removed. • Finally, the algorithm is called recursively on N. Diagnosis Dimension T Cancer PregRel Diabetes {PR,D} {C} Lung Canc. Diab. Preg. Ins. Dep. Diab. {DP,IDD} {LC} Ins. Dep. Diab. Preg. LLLC 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  14. Did We Solve The Problem ? • Now we can re-use stored values to correctly compute new higher-level values. Diagnosis Dimension Residence Dimension T 1 T 3 Cancer PregRel Diabetes 1 1 0 Sydney Outback 1 2 {PR,D} {C} 1 0 Diab. Preg. Ins. Dep. Diab. Lung Canc. Sydney CityOutback 1 2 1 1 0 1 Central 2 Rural {DP,IDD} {LC} 1 0 Ins. Dep. Diab. Preg. LLLC 1 0 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  15. Integration in Current Systems • The transformations should be transparent to the user. • Modern OLAP systems have client and server components, communicating using, e.g., OLE DB for OLAP or MD-API. • OLAP queries consist of 80% navigation and 20% aggregation queries [Kimball]. • Integration is achieved using a Query Handler that sends navigation queries to the original DB and aggregation queries to the transformed DB. • Small space overhead (the hierarchies only are stored twice). 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  16. Conclusion and Future Work • We have presented algorithms that automatically transform non-summarizable (non-covering, non-onto, non-strict) hierarchies to become summarizable. • The algorithms have a practical, low complexity and can be applied incrementally to minimize the cost of updates. • The algorithms can be applied to non-summarizable relationships between facts and dimensions. • The presented technique can be implemented in current OLAP systems transparently to the user. • Future work: • Transform only the part of the hierarchy that has been selected for materialization. • Take the aggregate function into account, i.e., MAX and MIN are insensitive to duplicates. 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

  17. Previous Work • Much work has focused on selecting the optimal subset of aggregate levels for practical pre-aggregation, given space and/or time constraints. • This work assumes summarizability. Thus, it cannot be applied to general hierarchies and fact-dim. relationships. • One paper has shown how to manually, and not transparently to the user, achieve summarizability for non-covering hierarchies. • We give algorithms and techniques that automatically achieves summarizability for non-covering, non-onto, and non-strict hierarchies, transparently to the user. • The algorithms and techniques can also be applied to non-summarizable relationships between facts and dimensions. 25th VLDB, Edinburgh, Scotland, September 7-10, 1999

More Related