1 / 24

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS

Explore the importance and techniques of achieving space efficiency in constructing algorithms for synopsis generation in various data representations. Examples include histograms, wavelets, and extended wavelets.

Download Presentation

SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN

  2. Synopses • Given n input numbers, summarize the input using B numbers, minimizing some error. • Examples • Histograms – piecewise constant repn. • Wavelets – uses the wavelet basis • Fourier, Bessel, SVD, what have you… Space efficiency in synopsis construction algorithms

  3. Why space efficiency • “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. • From a computational viewpoint however… Space efficiency in synopsis construction algorithms

  4. Space is the cruelest resource • Resources • Time : tweedle thumbs • Access (stream): make more passes • Program simply will not run – or if data is shifted to disk, will run quite slow(er). • Further, if we had more space, maybe we can compute a better (more accurate) synopsis Space efficiency in synopsis construction algorithms

  5. Examples - I • Histograms • Many error measures • V-OPT, Jagadish etal, 1998 • O(n2B) time O(nB) space • Only O(n) space at a time (working space) • O(n2B2) time and O(n) space • Is that the best ? • Here: O(n2B) time O(n) space. Space efficiency in synopsis construction algorithms

  6. Example - II • (Haar) Wavelets • Orthonormal systems • For l2error store the largest B coeffs of input • Does not work for non l2 • Find the best B coeffs to retain (note, restricted). • Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) • Here O(n) space, and O(n2) time Space efficiency in synopsis construction algorithms

  7. Example - III • Extended Wavelets • Multiple measures • Optimization is similar to Knapsack with choices. • Previous best – • Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time • Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) • Here, O(BM) space Space efficiency in synopsis construction algorithms

  8. What we will not talk about • Approximation algorithms for histograms • Range Query Histograms • Basically improvement of a factor B in space across the board. • B is not always small, specially when n is large Space efficiency in synopsis construction algorithms

  9. The main idea • Can we solve using a non DP paradigm ? • Well, divide & conquer … • Small details – how do we divide ? • Interaction • Does a small interaction partitioning exist ? • How (much size) to represent it ? • Ease of finding it (in the given representation) ? Space efficiency in synopsis construction algorithms

  10. A case study - Histograms • Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 • Consider one bucket. • The mean is the best value. • A natural DP … Space efficiency in synopsis construction algorithms

  11. The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n Space efficiency in synopsis construction algorithms

  12. What if • We could figure out what was the story at the middlepoint ! • Two questions • So what ? • How ? (use a DP) Space efficiency in synopsis construction algorithms

  13. Wait a minute … • We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ? Space efficiency in synopsis construction algorithms

  14. The code Space efficiency in synopsis construction algorithms

  15. The end of working space • If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. • Working space = total space. Space efficiency in synopsis construction algorithms

  16. How much is little ? Space efficiency in synopsis construction algorithms

  17. Wavelets • A set of vectors • {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} • {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} • {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution Space efficiency in synopsis construction algorithms

  18. Wavelet Synopsis Construction • Formally, given a signal X and the Haar basis {i} find a representation F=i zii with at most B non-zero zi minimizing some error which a fn of X-F • Restriction. Zi is either 0 or h X,ii • Debate. Unrestricted or restricted. Omit. Space efficiency in synopsis construction algorithms

  19. Wavelets • ||X-F||1 • Long history • Matias, Vitter Wang ’98 • Garofalakis, Gibbons, ’02 • Garofalakis, Kumar, ’04 • State of the Art • O(n2B log B) time • O(n2B) space • O(nB) working space • Here O(n2log B) time O(n) space • SEE ALSO NEXT TALK … Space efficiency in synopsis construction algorithms

  20. What happens to wavelets [GK04] ? Space efficiency in synopsis construction algorithms

  21. Extensions • Approximation Algorithms • Range Query Histograms • Extended Wavelets Space efficiency in synopsis construction algorithms

  22. Histograms • Saves space across all algorithms except algorithms which extend to general error measure over streams Space efficiency in synopsis construction algorithms

  23. Range Query • Same story • Open Q: • faster algorithm obeying synopsis size Space efficiency in synopsis construction algorithms

  24. That’s all folks Space efficiency in synopsis construction algorithms

More Related