250 likes | 395 Views
Beyond Set Disjointness : The Communication Complexity of Finding the Intersection. Grigory Yaroslavtsev http://grigory.us. Joint with Brody, Chakrabarti , Kondapally and Woodruff. Communication Complexity [Yao’79]. Shared randomness. Bob: . Alice: . ….
E N D
Beyond Set Disjointness: The Communication Complexity of Finding the Intersection GrigoryYaroslavtsev http://grigory.us Joint with Brody, Chakrabarti, Kondapally and Woodruff
Communication Complexity [Yao’79] Shared randomness Bob: Alice: … • = min. communication (error ) • min. -round communication (error )
Set Intersection = ? (-Intersection) = ?
This talk Let • (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODC’14] • (-Intersection) = [Saglam-Tardos FOCS’13; Brody, Chakrabarti, Kondapally, Woodruff, Y.’13] { times (-Intersection) = for
-Disjointness • , iff • [Razborov’92; Hastad-Wigderson’96] [Folklore + Dasgupta, Kumar, Sivakumar; Buhrman’12, Garcia-Soriano, Matsliah, De Wolf’12] • [Saglam, Tardos’13] • [Braverman, Garg, Pankratov, Weinstein’13]
Applications • : exact Jaccard index ( for -approximate use MinHash[Broder’98; Li-Konig’11; Path-Strokel-Woodruff’14]) • Rarity, distinct elements, joins,… • Multi-party set intersection (later) • Contrast:
Hashing Expected # of elements =# of buckets
Secondary Hashing where = # of hash functions
2-Round -protocol Total communication = = O()
Collisions • Second round: • For each bucket send -bit equality check (total -communication) • Correct intersection computed in buckets where • Expected # of items in incorrect buckets • Use 1-round protocol for incorrect buckets • Total communication
Main protocol Expected # of elements =# of buckets
Verification tree -degree … buckets = leaves of the verification tree
Verification bottom-up Incorrect Incorrect Correct EQ() Incorrect Correct
Verification bottom-up Incorrect Incorrect Correct EQ() Correct Correct Incorrect Correct
Verification bottom-up … … …
Analysis of Stage • = [node at stage computed correctly] • Set = • Run equality checks and basic intersection protocols with success probability • Key lemma: [# of restarts per leaf • Cost of Equality = • Cost of Intersection in leafs = • [protocol succeeds] =
Lower Bound • (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.’13] • iff, where • = solving independent instances of • reduces to -Intersection: • Given and • Construct sets with elements and
Communication Direct Sums “Solving m copies of a communication problem requires m times more communication”: • For arbitrary [… Braverman, Rao 10; Barak Braverman, Chen, Rao 11, ….] • In general, can’t go beyond
Specialized Communication Direct Sums Information cost Communication complexity • [Bar Yossef, Jayram, Kumar,Sivakumar’01] Disjointness • Stronger direct sum for bounded-round complexity of Equality-type problems (a.k.a. “union bound is optimal”) [Molinaro, Woodruff, Y.’13]
Extensions • Multi-party: players, , where • Boost error probability to • Average per player (using coordinator): in rounds • Worst-case per player (using a tournament) in rounds
Open Problems • (-Intersection) = • Better protocols for the multi-party setting