240 likes | 253 Views
Explore the importance and techniques of achieving space efficiency in constructing algorithms for synopsis generation in various data representations. Examples include histograms, wavelets, and extended wavelets.
E N D
SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN
Synopses • Given n input numbers, summarize the input using B numbers, minimizing some error. • Examples • Histograms – piecewise constant repn. • Wavelets – uses the wavelet basis • Fourier, Bessel, SVD, what have you… Space efficiency in synopsis construction algorithms
Why space efficiency • “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. • From a computational viewpoint however… Space efficiency in synopsis construction algorithms
Space is the cruelest resource • Resources • Time : tweedle thumbs • Access (stream): make more passes • Program simply will not run – or if data is shifted to disk, will run quite slow(er). • Further, if we had more space, maybe we can compute a better (more accurate) synopsis Space efficiency in synopsis construction algorithms
Examples - I • Histograms • Many error measures • V-OPT, Jagadish etal, 1998 • O(n2B) time O(nB) space • Only O(n) space at a time (working space) • O(n2B2) time and O(n) space • Is that the best ? • Here: O(n2B) time O(n) space. Space efficiency in synopsis construction algorithms
Example - II • (Haar) Wavelets • Orthonormal systems • For l2error store the largest B coeffs of input • Does not work for non l2 • Find the best B coeffs to retain (note, restricted). • Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) • Here O(n) space, and O(n2) time Space efficiency in synopsis construction algorithms
Example - III • Extended Wavelets • Multiple measures • Optimization is similar to Knapsack with choices. • Previous best – • Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time • Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) • Here, O(BM) space Space efficiency in synopsis construction algorithms
What we will not talk about • Approximation algorithms for histograms • Range Query Histograms • Basically improvement of a factor B in space across the board. • B is not always small, specially when n is large Space efficiency in synopsis construction algorithms
The main idea • Can we solve using a non DP paradigm ? • Well, divide & conquer … • Small details – how do we divide ? • Interaction • Does a small interaction partitioning exist ? • How (much size) to represent it ? • Ease of finding it (in the given representation) ? Space efficiency in synopsis construction algorithms
A case study - Histograms • Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 • Consider one bucket. • The mean is the best value. • A natural DP … Space efficiency in synopsis construction algorithms
The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n Space efficiency in synopsis construction algorithms
What if • We could figure out what was the story at the middlepoint ! • Two questions • So what ? • How ? (use a DP) Space efficiency in synopsis construction algorithms
Wait a minute … • We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ? Space efficiency in synopsis construction algorithms
The code Space efficiency in synopsis construction algorithms
The end of working space • If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. • Working space = total space. Space efficiency in synopsis construction algorithms
How much is little ? Space efficiency in synopsis construction algorithms
Wavelets • A set of vectors • {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} • {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} • {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution Space efficiency in synopsis construction algorithms
Wavelet Synopsis Construction • Formally, given a signal X and the Haar basis {i} find a representation F=i zii with at most B non-zero zi minimizing some error which a fn of X-F • Restriction. Zi is either 0 or h X,ii • Debate. Unrestricted or restricted. Omit. Space efficiency in synopsis construction algorithms
Wavelets • ||X-F||1 • Long history • Matias, Vitter Wang ’98 • Garofalakis, Gibbons, ’02 • Garofalakis, Kumar, ’04 • State of the Art • O(n2B log B) time • O(n2B) space • O(nB) working space • Here O(n2log B) time O(n) space • SEE ALSO NEXT TALK … Space efficiency in synopsis construction algorithms
What happens to wavelets [GK04] ? Space efficiency in synopsis construction algorithms
Extensions • Approximation Algorithms • Range Query Histograms • Extended Wavelets Space efficiency in synopsis construction algorithms
Histograms • Saves space across all algorithms except algorithms which extend to general error measure over streams Space efficiency in synopsis construction algorithms
Range Query • Same story • Open Q: • faster algorithm obeying synopsis size Space efficiency in synopsis construction algorithms
That’s all folks Space efficiency in synopsis construction algorithms