240 likes | 249 Views
Explore the importance and methods of achieving space efficiency in the construction of algorithm synopses, with examples in histograms and wavelets. Learn the significance of resource management and computational strategies to optimize space usage effectively.
E N D
SPACE EFFICENCY OF SYNOPSIS CONSTRUCTION ALGORITHMS Sudipto Guha UPENN
Synopses • Given n input numbers, summarize the input using B numbers, minimizing some error. • Examples • Histograms – piecewise constant repn. • Wavelets – uses the wavelet basis • Fourier, Bessel, SVD, what have you… Space efficiency in synopsis construction algorithms
Why space efficiency • “Interestingly, according to modern astronomers, space is finite. This is a very comforting thought – particularly for people who can never remember where they left things.” Woody Allen. • From a computational viewpoint however… Space efficiency in synopsis construction algorithms
Space is the cruelest resource • Resources • Time : tweedle thumbs • Access (stream): make more passes • Program simply will not run – or if data is shifted to disk, will run quite slow(er). • Further, if we had more space, maybe we can compute a better (more accurate) synopsis Space efficiency in synopsis construction algorithms
Examples - I • Histograms • Many error measures • V-OPT, Jagadish etal, 1998 • O(n2B) time O(nB) space • Only O(n) space at a time (working space) • O(n2B2) time and O(n) space • Is that the best ? • Here: O(n2B) time O(n) space. Space efficiency in synopsis construction algorithms
Example - II • (Haar) Wavelets • Orthonormal systems • For l2error store the largest B coeffs of input • Does not work for non l2 • Find the best B coeffs to retain (note, restricted). • Garofalakis & Kumar, 04 O(n2B log B) time O(n2B) space, but O(nB) needed at a time (for l1 ) • Here O(n) space, and O(n2) time Space efficiency in synopsis construction algorithms
Example - III • Extended Wavelets • Multiple measures • Optimization is similar to Knapsack with choices. • Previous best – • Deligiannakis and Rossopoulos, 04, O(Mn(B+ log n)) time and space O(MnB), but needing O(nM+MB) at a time • Guha, Kim, Shim, 04, reduced space to O(BM+min {nM,B2}) • Here, O(BM) space Space efficiency in synopsis construction algorithms
What we will not talk about • Approximation algorithms for histograms • Range Query Histograms • Basically improvement of a factor B in space across the board. • B is not always small, specially when n is large Space efficiency in synopsis construction algorithms
The main idea • Can we solve using a non DP paradigm ? • Well, divide & conquer … • Small details – how do we divide ? • Interaction • Does a small interaction partitioning exist ? • How (much size) to represent it ? • Ease of finding it (in the given representation) ? Space efficiency in synopsis construction algorithms
A case study - Histograms • Formally, given a signal X find a piecewise constant representation H with at most B pieces minimizing ||X-H||2 • Consider one bucket. • The mean is the best value. • A natural DP … Space efficiency in synopsis construction algorithms
The DP for histograms Err[i,b] = Error of approximating x1,…,xi using b buckets For i=1 to n do For 2 to B do For j=1 to i-1 do Err[i,b] = min Err[i,b], Err[j,b-1] + error(j+1,i) B n Space efficiency in synopsis construction algorithms
What if • We could figure out what was the story at the middlepoint ! • Two questions • So what ? • How ? (use a DP) Space efficiency in synopsis construction algorithms
Wait a minute … • We just replaced a DP by another and claimed something … !!! Exactly. The second DP needs only O(n) space. So as the conquer steps re-use/share the same space; the total space is O(n) too. The idea is to use divide and conquer; and use a (small) DP to find the divide step. Is it really that simple ? Space efficiency in synopsis construction algorithms
The code Space efficiency in synopsis construction algorithms
The end of working space • If you can partition a problem using the working space – you can recompute the solution of the parts at a little extra cost. • Working space = total space. Space efficiency in synopsis construction algorithms
How much is little ? Space efficiency in synopsis construction algorithms
Wavelets • A set of vectors • {1,-1,0,0,0,0…}, {0,0,1,-1,0,0,…},{0,0,0,0,1,-1,0,0},{0,0,0,0,0,0,1-1} • {1,1,-1,-1,0,0,0,0},{0,0,0,0,1,1,-1,-1} • {1,1,1,1,-1,-1,-1,-1},{1,1,1,1,1,1,1,1} A natural multi-resolution Space efficiency in synopsis construction algorithms
Wavelet Synopsis Construction • Formally, given a signal X and the Haar basis {i} find a representation F=i zii with at most B non-zero zi minimizing some error which a fn of X-F • Restriction. Zi is either 0 or h X,ii • Debate. Unrestricted or restricted. Omit. Space efficiency in synopsis construction algorithms
Wavelets • ||X-F||1 • Long history • Matias, Vitter Wang ’98 • Garofalakis, Gibbons, ’02 • Garofalakis, Kumar, ’04 • State of the Art • O(n2B log B) time • O(n2B) space • O(nB) working space • Here O(n2log B) time O(n) space • SEE ALSO NEXT TALK … Space efficiency in synopsis construction algorithms
What happens to wavelets [GK04] ? Space efficiency in synopsis construction algorithms
Extensions • Approximation Algorithms • Range Query Histograms • Extended Wavelets Space efficiency in synopsis construction algorithms
Histograms • Saves space across all algorithms except algorithms which extend to general error measure over streams Space efficiency in synopsis construction algorithms
Range Query • Same story • Open Q: • faster algorithm obeying synopsis size Space efficiency in synopsis construction algorithms
That’s all folks Space efficiency in synopsis construction algorithms