1 / 39

A Thin Monitoring Layer for Top- k Aggregation Queries over a Database

A Thin Monitoring Layer for Top- k Aggregation Queries over a Database. Data Cube. s um(price *quantity). Brand. Country. Product Type. Data Cube. What are the top-2 product types with the highest revenue of each brand in each country?

winter
Download Presentation

A Thin Monitoring Layer for Top- k Aggregation Queries over a Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Thin Monitoring Layer for Top-k Aggregation Queries over a Database

  2. Data Cube sum(price*quantity) Brand Country Product Type

  3. Data Cube What are the top-2 product types with the highest revenue of each brand in each country? What are the top-2 brands with the highest revenue in each country?

  4. Top-k Queries • Primary Attribute: The attribute/dimension over which the selection is performed (e.g. product type) • Secondary Attributes: Used to filter specific results (e.g. brand, country) • Aggregated Attributes: Used to compute an aggregated score (e.g. price, quantity) • Aggregate Function: e.g. sum One top-k query for each combination of secondary attribute instances (filtering condition)

  5. Filtering Conditions: Example (1) brand={X} - country = {Y, W} brand=X AND country=Y brand=X AND country=W SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=W GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

  6. Filtering Conditions: Example (2) country = {Y, W} - brand={X} country=Y country=W brand=X SELECT type, SUM(price*quantity) FROM relation WHERE country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE country=W GROUP BY type ORDER BY SUM(price*quantity) LIMIT K SELECT type, SUM(price*quantity) FROM relation WHERE brand=X GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

  7. Filtering Conditions: Example (3) country = {Y, W} - brand={X} SELECT type, SUM(price*quantity) FROM relation GROUP BY type ORDER BY SUM(price*quantity) LIMIT K

  8. Updates Insertions to the underlying database that contain all information related to the top-k queries INSERT INTO relation (type, brand, country, price, quantity) VALUES (T, X, Y, 100, 3)

  9. Problem How to maintain all these queries in the presence of fast updates?

  10. Outline • Setting/Problem • Algorithms • Naïve Approach • Estimates Approach • Groups Approach • Experimental Results • Conclusions

  11. Example SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y GROUP BY type ORDER BY SUM(price*quantity) LIMIT 2 Update: (type, X, Y, 300)

  12. Naïve Approach Case 1: type in the top-2, e.g. (B,X,Y,300) +300 Case 2: type NOT in the top-2, e.g. (K,X,Y,300) Verification Query: SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=K GROUP BY type

  13. Estimates ApproachIn-memory Structures • top-(k+N) instances with exact aggregated scores • B instances with estimated aggregated scores • best possible score (basic score) + inserted values Buffer top-2 top-5

  14. Estimates Approach Case 1.1: type in the top-2, e.g. (B,X,Y,300) +300 top-2 top-2 top-5 top-5

  15. Estimates Approach Case 1.2: type in the top-5, e.g. (D,X,Y,300) top-2 top-2 +300 top-5 top-5

  16. Estimates Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) Buffer Buffer top-2 +300 top-5 Verification Query:SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type=P GROUP BY type

  17. Estimates Approach Sub-case 2.1: score(P) < score(E), e.g. score(P) = 756 Buffer top-2 top-5

  18. Estimates Approach Sub-case 2.2: score(P) > score(E), e.g. score(P) = 2178 Buffer top-2 top-5

  19. Estimates Approach Sub-case 2.3: score(P) > score(B), e.g. score(P) = 2407 Buffer top-2 top-5

  20. Estimates Approach Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300) Estimated Score(T) = basic score + 300 = 2287 Buffer Full Reset Query SELECT type, SUM(price*quantity) FROM relation WHERE brand=X AND country=Y AND type IN (O,P,Q,R) GROUP BY type

  21. Estimates Approach Case 3: type NOT in in-memory structures, e.g. (T,X,Y,300) score(O)=1254, score(P)=432, score(Q)=2050, score(R)=1990 Buffer top-2 top-5

  22. Queries Characteristics • SAME primary attribute • SAME aggregate attributes • SAME aggregate function • SAME top-k condition • DIFFERENT filtering condition

  23. Lattice organisation

  24. Groups Approach • The updates are forwarded from top to bottom in the lattice • Each ranking forwards the queried results to the rankings lying in lower levels in the lattice

  25. Groups Approach: Example Update: (type, X, Y, 300) Ranking: brand=X, country=ANY SELECT type, SUM(price*quantity) FROM relation WHERE brand=X GROUP BY type ORDER BY SUM(price*quantity) LIMIT 2

  26. Groups Approach Case 2: type in the Buffer, e.g. (P,X,Y,300) Verification Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type=P Case 4: type NOT in in-memory structures, e.g. (T,X,Y,300) Buffer Reset Query: SELECT type, brand, country, price*quantity FROM relation WHERE brand=X AND type IN (O,P,Q,R)

  27. Groups Approach • Tuples(type, brand, country, price*quantity) limited to those satisfying its filtering condition • Uses them to compute the scores. • Forwards them to the rankings lower in lattice • Rankings receiving tuples use those qualifying to their filtering condition to compute the scores

  28. Outline • Problem • Algorithms • Naïve Approach • Estimates Approach • Groups Approach • Experimental Results • Conclusions

  29. Experiments (1) • TPC-H data • Select on part.p_partKey (200,000 unique values) • Filter on customer.c_mktsegment, orders.o_orderpriority and region.r_name • Aggregation sum on lineitem.l_quantity • 216 total rankings • 30,000 updates/insertions

  30. Experiments (2) Updates • Random: inserts quantity between 1 and 50 for a random part.p_partKey. • 80-20: inserts quantity between 1 and 50 for a part.p_partKey selected according to the 80-20 rule N-extra Gap • Difference between top-k and top-(k+N) scores 100% (1*50) and 200% (2*50)

  31. 80-20 Updates: Queries

  32. 80-20 Updates: Time

  33. Random Updates: Queries

  34. Random Updates: Time

  35. Naïve Approach • 80-20 updates: 239,985 Verification Queries, 4 secs/update • Random updates: 239,977 Verification Queries, 4 secs/update

  36. Outline • Problem • Algorithms • Naïve Approach • Estimates Approach • Groups Approach • Experimental Results • Conclusions

  37. Conclusion • Twoalgorithms to maintain top-k rankings in the presence of fast updates arriving in an underlying database • Exact top-kresults • Faster than a Naïve approach while Groups Approach limits further the communication with the database • Preliminary results which provide insights on the impact of the various parameters in the effectiveness of our methods

  38. Thank you!

  39. Additional Instances

More Related