380 likes | 526 Views
Efficient Computation of Temporal Aggregates with Range Predicates. D. Zhang * , A. Markowetz ** , V. J. Tsotras * , D. Gunopulos * and B. Seeger ** * University of California, Riverside ** Philipps Universit ä t Marburg, Germany. Outline. Introduction & Motivation Problem Decomposition
E N D
Efficient Computation of TemporalAggregates with Range Predicates D. Zhang*, A. Markowetz**, V. J. Tsotras*, D. Gunopulos* and B. Seeger** * University of California, Riverside ** Philipps Universität Marburg, Germany
Outline • Introduction & Motivation • Problem Decomposition • The MVSB-tree • Performance Results • Conclusions
Introduction & Motivation • Consider a collection of temporal records. • Each record: key k , value v , time interval [t1 , t2]. • E.g.: employees and their salaries over time. • Temporal Aggregation: aggregate values over time. • Focus on SUM/COUNT/AVG. Introduction & Motivation
Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation
Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] E.g. the sum at t2 is 13. ‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01]) Introduction & Motivation
Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] E.g. the sum at t2 is 13. ‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01]) E.g. the sum over [t1 , t2] is 28. Introduction & Motivation
Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’. E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19. Introduction & Motivation
Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’. E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19. • Find AVG salary over past ten years of all employees whose last names start with ‘B’. Introduction & Motivation
Previous approaches would need a separate index for each possible key range. (inefficient) • Alternative: • index the records; • selection query: ‘find all records intersecting [k1, k2]x [t1, t2]’. • Query time is O(n). • Our solution: O(logbn). Introduction & Motivation
Problem Decomposition • Decompose RTA into LKST and LKLT queries. LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. E.g. LKST(k2, t2)=11. Problem Decomposition
LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. E.g. LKLT(k2, t2)=20. Problem Decomposition
- + = RTA([k1, k2]x[t1, t2]) Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - + Problem Decomposition
= RTA([k1, k2]x[t1, t2]) - LKST(k1, t2) - + Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + - LKLT(k1, t2) Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k1, t1) Problem Decomposition
= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) - LKLT(k1, t1) Problem Decomposition
RTA([k1, k2]x[t1, t2]) = LKST(k2, t2) - LKST(k1, t2) + LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k2, t1) + LKLT(k1, t1) • The RTA query is decomposed to LKST and LKLT. Problem Decomposition
Index Design • Both LKST and LKLT are point queries: ‘given k, t, return value’. • An index for LKST and LKLT should: • store points in key-time space; • maintain a value for each point; • support point queries. Index Design
a record: at t1, inserted as: at t2, updated as: Model • Assume updates come in increasing time order (transaction-time model). Index Design
at t1 at t2 The LKST index The effect of inserting record (k, [t1, t2], v): Index Design
at t2 The LKLT index The effect of inserting record (k, [t1, t2], v): no update at t1 Index Design
Update Operation • Common update operation for both: insert (k, t):v. • That is: add v to all points in [k, t] x [kmax, tmax]. • Conclusion: an index supporting point query and the above update can be used for LKLT and LKST. Index Design
The MVSB-tree • A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree
Insertion The MVSB-tree
Insertion (cont.) • To handle overflow, copy records with end=tmax to a new page. The MVSB-tree
copy root2: [4, tmax) root1: [1, 4) Insertion (cont.) • To handle overflow, copy records with end=tmax to a new page. • Strong overflow: limit the number of records in a new page. The MVSB-tree
Point Query (k , t ) • Follows a single path: the nodes containing (k , t ). • Aggregates the values found in this path. The MVSB-tree
Point Query (k , t ) • Follows a single path: the nodes containing (k , t ). • Aggregates the values found in this path. • E.g.: PointQuery(23, 7) = 5+2 = 7. The MVSB-tree
Efficiency • Theorem: with 2 MVSBT indices, we achieve: • RTA query: O(logbn); • Update: O(logbK); • Space: O( * logbK). • n = number of updates; • K= number of different keys; • b = page capacity (in records). The MVSB-tree
Performance Results • Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; • Datasets: created using the TimeIT [KS98] software and transformed to add record keys. • Each dataset has a million records (10k unique keys; on average 100 intervals per key). • Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results
Index Sizes Performance Results
Query Speedup • Query time is averaged over 100 queries of the same query rectangle size. Performance Results
Conclusions • We addressed the range-temporal aggregation (RTA) problem; • New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; • Query time reduced from O(n) to O(logbn) with small space overhead; • Open problems: • Min/Max range-temporal aggregation; • Valid-time environment; • Multi-dimensional aggregation over objects with extents.