1 / 18

Advanced Database Aggregation Query Processing

Advanced Database Aggregation Query Processing. Donghui Zhang Computer Science Department University of California, Riverside. Aggregation Problem.

trent
Download Presentation

Advanced Database Aggregation Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Database Aggregation Query Processing Donghui Zhang Computer Science Department University of California, Riverside EDBT Ph.D. Workshop 2002

  2. Aggregation Problem • Maintain a set of objects, each having a value. Given a condition which holds for a sub-set of objects, compute the total value of objects in this sub-set. • E.g. “find the total salary of employees who joined the company less than a year”. EDBT Ph.D. Workshop 2002

  3. Aggregation over Objects with Extent • Objects with extent: versus point objects. • Real-life applications: temporal, spatial, etc. • An employee works for the company during a certain period of time; “find the total salary of employees who worked for the company during 1999”. • A rainfall record occurs within a spatial region; “find the total volume of rainfall in Los Angeles”. EDBT Ph.D. Workshop 2002

  4. Functional Box-Sum • Maintain a set of objects, each having a box and a value function; • given query box q, computethe total value of objects intersecting q, where • the contribution of an object is the integral of its value function over its intersection with q. EDBT Ph.D. Workshop 2002

  5. Functional Box-Sum • functional box-sum: 4*50+3*12 = 236. EDBT Ph.D. Workshop 2002

  6. 20 ò - - = ( 11 7 ) ( x 2 ) d x 310. 15 Functional Box-Sum • Moreover, object value can be a function; • FBS= EDBT Ph.D. Workshop 2002

  7. Straightforward Approaches • No index. For each query, scan through all records. Not efficient. • Maintain the objects in an R-tree (which speeds up the selection query). To compute an aggregate, select the objects and aggregate their values on-the-fly. Query time: O(n). EDBT Ph.D. Workshop 2002

  8. Our Solution • We reduce the functional box-sum problem into a simpler problem (the dominance-sum problem) and we build an index specialized for computing the dominance-sums. • Instead of storing the original data, the specialized index stores specially aggregated information, which leads to O(log2n) query time. EDBT Ph.D. Workshop 2002

  9. Functional Box-Sum  OIFBS • A special case of functional box-sum is OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. • A functional Box-Sum query can be reduced to the OIFBS: we compute the OIFBS from origin to upper right corner of the query box, then subtract the parts to the left and below the query box (which are also OIFBS queries). EDBT Ph.D. Workshop 2002

  10. Dominance-Sum • Maintain a set of weighted points; • Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 EDBT Ph.D. Workshop 2002

  11. OIFBS  Dominance-Sum • Idea: to insert an object (with a rectangular region), insert its corner points, associating a function with each corner. • To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p, i.e. the summation of functions associated with points dominated by p. EDBT Ph.D. Workshop 2002

  12. New Dominance-Sum Index • For the dominance-sum problem, we propose the BA-tree: • a k-d-B-tree augmented with additional information at index records. • O(log2n) query time, when balanced. EDBT Ph.D. Workshop 2002

  13. Performance Functional box-sum query cost EDBT Ph.D. Workshop 2002

  14. Summary of Our Aggregation Work • The functional box-sum solution described here is to appear in [PODS’02]. • Also in [PODS’02], we solved a variation: a simple box-sum aggregation problem, which is to find the total value of objects intersecting the query rectangle. • We solved some other aggregation problems... EDBT Ph.D. Workshop 2002

  15. Range-Temporal Aggregation • Maintain a set of temporal records, each having a key, a value and a time interval. Given a key range r and time interval i, compute the total value of records whose keys are in r and whose intervals intersect i. • Appeared in [PODS’01]. EDBT Ph.D. Workshop 2002

  16. Temporal Aggregation over Data Streams • Temporal aggregation in the circumstance when records accumulate in a streaming manner. There is limited storage, but we want to answer aggregation queries both for recent data and for older data. • To appear in [EDBT’02]. EDBT Ph.D. Workshop 2002

  17. Box-Max Aggregation • Maintain a set of spatial objects, each having a spatial region and a value. Given a query region r, find the Min/Max value over all objects intersecting r. • Appeared in [GIS’01]. EDBT Ph.D. Workshop 2002

  18. Conclusions • We have proposed specialized index structures for various complex aggregation problems. • In all cases, our proposed methods have much better query performance than the existing approaches, sometimes over 100 times faster. • We recommend that these indices should be implemented in commercial DBMS in circumstances when the aggregates need to be computed very fast. EDBT Ph.D. Workshop 2002

More Related