220 likes | 400 Views
Beyond Set Disjointness : The Communication Complexity of Finding the Intersection. Grigory Yaroslavtsev http://grigory.us. Joint with Brody, Chakrabarti , Kondapally and Woodruff. Communication Complexity [Yao’79]. Shared randomness. Bob: . Alice: . ….
E N D
Beyond Set Disjointness: The Communication Complexity of Finding the Intersection GrigoryYaroslavtsev http://grigory.us Joint with Brody, Chakrabarti, Kondapally and Woodruff
Communication Complexity [Yao’79] Shared randomness Bob: Alice: … • = min. communication (error ) • min. -round communication (error )
Set Intersection = ? (-Intersection) = ? is big, n is huge, where huge big
Our results Let • (-Intersection) = [Brody, Chakrabarti, Kondapally, Woodruff, Y.; PODC’14] • (-Intersection) = [Saglam-Tardos FOCS’13; Brody, Chakrabarti, Kondapally, Woodruff, Y.’; RANDOM’14] { times (-Intersection) = for
Applications • ExactJaccard index (for -approximate use MinHash[Broder’98; Li-Konig’11; Path-Strokel-Woodruff’14]) • Rarity, distinct elements, joins,… • Multi-party set intersection (later) • Contrast:
Hashing Expected # of elements =# of buckets
Secondary Hashing where = # of hash functions
2-Round -protocol Total communication = = O()
Collisions Key fact: If then also =
Collisions • Second round: • For each bucket send -bit equality check (total -communication) • Correct intersection computed in buckets where • Expected # items in incorrect buckets • Use 1-round protocol for incorrect buckets • Total communication
Main protocol Expected # of elements =# of buckets
Verification tree -degree … buckets = leaves of the verification tree
Verification bottom-up Incorrect Incorrect Correct EQUALITY CHECK Incorrect Correct
Verification bottom-up EQUALITY CHECK FAILS => RESTART THE SUBTREE Incorrect Correct Correct Correct Incorrect Correct
Verification bottom-up … … …
Analysis of Stage • = [node at stage computed correctly] • Set = • Run equality checks and basic intersection protocols with success probability • Key lemma: [# of restarts per leaf=> Cost of Intersection in leafs = • Cost of Equality = • [protocol succeeds] =
Multi-partyextensions players: , where • Boost error probability of 2-player protocol to • Average per player (using coordinator): in rounds • Worst-case per player (using a tournament) in rounds
Open Problems • (-Intersection) =? • Better protocols for the multi-party setting?
-Disjointness • , iff • [Razborov’92; Hastad-Wigderson’96] [Folklore + Dasgupta, Kumar, Sivakumar; Buhrman’12, Garcia-Soriano, Matsliah, De Wolf’12] • [Saglam, Tardos’13] • [Braverman, Garg, Pankratov, Weinstein’13]