190 likes | 374 Views
Algorithms for data streams Lecture 2. Foundations of Data Science 2014 Indian Institute of Science Navin Goyal. Estimating using the AMS sketch. Given a turnstile stream estimate within multiplicative error with probability at least
E N D
Algorithms for data streamsLecture 2 Foundations of Data Science 2014 Indian Institute of Science Navin Goyal
Estimating using the AMS sketch • Given a turnstile stream estimate within multiplicative error with probability at least • Obvious solution takes space (maintain the frequency vector). Can’t do better deterministically • Randomized algorithm [Alon—Matias—Szegedy ’96]: • Sample a random vector with each coordinate chosen uniformly at random from independently • So if we could compute then we could estimate
Basic AMS algorithm for • Given a turnstile stream estimate within multiplicative error with probability at least Basic AMS estimator: • Choose a random vector • Initialize • Until the end of the stream do • On arrival of element • At the end of the stream • is an estimator of • Problem: requires space
is a reasonable estimator of • (proof on the board; also in the book) • Application of Chebyshev: • Can improve by the median of the means estimator: and ,… • Output median • This gives -approximation of
The AMS sketch • How much space does the basic AMS sketch take (without the median of the means trick)? • (assuming are bounded by a constant) • So space is sufficient • No! • We also need to remember random vector • And this requires bits • What essential property of the random vector did we use?
The AMS sketch • What essential property of the random vector did we use? • For , we used for all • For , we used for all pairwise distinct • This is satisfied if the are 4-wise independent: For any pairwise distinct random variables are mutually independent • For our situation, this means that for any we have
Constructing pairwise independent random bit vectors • Given a uniformly random vector ( bits of perfect randomness) • We use to construct a pairwise independent random vector ( bits of useful randomness) • We index by nonempty subsets of • For define ClaimFor distinct and nonemptly, and are independent and uniformly distributed ProofOn the board • are not 3-wise independent
2-wise independent hash function families • Very useful concept both in theory and practice • Let and • A family of functions is called -wise independent if for any distinct , and any , and for chosen uniformly at random from , we have (Also called -universal family) • The set of all functions is 2-universal • It’s very large: , describing one function takes bits
Pairwise independent random vectors 2-wise independent hash functions • We say that random vector is pairwise independent if for any distinct we have and are independent • A random hash function from a 2-wise independent hash function family of functions mapping gives us a pairwise independent random vector: with • Hash function language slightly more convenient in some situations • A non-streaming example of the utility of 2-wise independence: MAX CUT
Constructing 2-wise independent hash function families • There are much smaller 2-wise independent families than the family of all functions • Suppose a prime number • For define : by • Intuition: Determining a line in the plane requires two distinct points on the line • This gives a family of size • is 2-wise independent • Need bits to store a function in • Evaluation of is constant time on RAM (or certainly
Constructing 2-wise independent hash function families using finite fields • More generally, we could take for some positive integer • : the finite field with elements • The elements of can be represented as bitvector of length • The field provides a way to add and multiply the elements in time • For (the finite field with elements) define by • Need bits to represent
2-wise independent hash function families • Can achieve and : • Elements of can be represented as -tuples • Represent in this way: • And define the new hash function by keeping just the first coordinate : Claim Functions above form a 2-wise independent hash function family Proof On the board
-wise independent hash function families • A family of functions is called-wise independent if for all distinct , and any , and for chosen uniformly at random from , we have • The family of all functions is -wise independent • There exist much smaller families obtained by generalizing the construction for pairwise independent hash families: • or (a prime number) For a -tuple define by • The above family is a-wise independent family of size • Intuition: A degree polynomial is fully specified by its values at points
Constructing 4-wise independent random -1/1-vector • Choose sufficiently large so that • Construct a 4-wise independent hash function family mapping • Define by • Functions form a -wise independent family • To generate a -wise independent random vector first choose a random • The random vector is • This is a -vector • To construct a -vector map to in the above vector
Basic AMS algorithm for Basic AMS estimator with fully independent random vector: • Choose a random vector • Initialize • Until the end of the stream do • On arrival of element Basic AMS estimator with -wise independent random vector: • Choose a random vector • Initialize • Until the end of the stream do • On arrival of element • can be evaluated in time
Back to the AMS sketch • Generate using a 4-wise independent family of hash functions from to • Requires space • Total space for the basic AMS sketch • Improve by the median of the means estimator: and ,… • Output median • Total space used • (-approximation)
AMS sketch is linear • The algorithm maintains • Corollary Given two streams and , we can get the sketch for their concatenation their sketches by adding them: • Geometric interpretation of the AMS sketch: Similar to Johnson—Lindenstrauss projection trick that preserves the length • Works in the turnstile model because of the linearity of the AMS sketch
Other ’s • For , algorithms with space [Indyk 2000] and later improvements (nearly tight) • For the problem becomes hard: (nearly tight)