540 likes | 703 Views
Optimality of Randomized Algorithms for the Intersection Problem. Presenters : 李宜益 范家豪 王紹中 Advisor : 呂學一. Outline. Introduction Definitions Randomized Algorithm Randomized Complexity Lower Bound. Introduction. Conjunctive query Comparison model
E N D
Optimality of Randomized Algorithms for the Intersection Problem Presenters : 李宜益 范家豪 王紹中 Advisor : 呂學一
Outline • Introduction • Definitions • Randomized Algorithm • Randomized Complexity Lower Bound
Introduction • Conjunctive query • Comparison model • Redundancy analysis • More natural assumptions • More precise
Definition (1) • Signature : (k, n1,…,nk) is a signature, where k is number of arrays, n1,…,nk are k sorted arrays U, U is a totally ordered space.
Example • This is a signature (7,1,4,4,4,4,4,4) A = 9 A : 9 B = 1 2 9 11 B : 1 2 9 11 3 12 13 C = 3 9 12 13 C : 9 9 14 15 16 D = 9 14 15 16 D : E = 4 10 17 18 E : 4 10 17 18 F = 5 6 7 10 F : 5 6 7 10 G = 8 10 19 20 G : 8 10 19 20
Definition(2) • Intersection The Intersection of an instance is the set A1… Akcomposed of the elements that are present in k distinct arrays
This is a signature (7,1,4,4,4,4,4,4) The intersection of this signature is 9 Example A = 9 A : 9 B = 1 2 9 11 B : 1 2 9 11 3 12 13 C = 3 9 12 13 C : 9 9 14 15 16 D = 9 14 15 16 D : E = 9 10 17 18 E : 9 10 17 18 F = 5 6 9 10 F : 5 6 9 10 G = 9 10 19 20 G : 9 10 19 20
Definition(3) • Partition-Certificate A partition- certificate is a partition (Ij) jδ (δis the minimal number of such partition of an instance) of U into intervals such that any singleton {x} corresponds to an element x of iAi, and each other interval I has an empty intersection I Ai with at least on array Ai
This is a signature (7,1,4,4,4,4,4,4) δ=3; (-, 9], [9, 10), [10, +) Example A = 9 A : 9 B = 1 2 9 11 B : 1 2 9 11 3 12 13 C = 3 9 12 13 C : 9 9 14 15 16 D = 9 14 15 16 D : E = 4 10 17 18 E : 4 10 17 18 F = 5 6 7 10 F : 5 6 7 10 G = 8 10 19 20 G : 8 10 19 20
Difficulty of Partition-Certificate • For each singleton {x} of the partition, any algorithm must find the position of x in all arrays Ai, which takes k searches
9 9 1 2 9 11 9 A: B: C: D: E: F: G: 12 13 3 9 9 14 15 9 9 4 9 9 10 9 9 10 5 6 7 8 9 9 10
Difficulty of Partition-Certificate • For each interval Ij of the partition, any algorithm must find an array, or a set of arrays, such that the intersection of Ij with this array, or with the intersection of those arrays, is empty
9 1 2 9 11 A: B: C: D: E: F: G: 12 13 3 9 10 14 15 9 4 9 10 9 10 10 5 6 7 8 9 10 10
Def(4) • Redundancy Let A1,…,Ak be k sorted arrays, and let (Ij)j be a partition-certificate for this instance • The redundancy (I) of an interval or singleton I is defined as equal to 1 if I is a singleton, and equal to 1/#{i, AiI = } otherwise.
Redundancy • The redundancy ((Ij)j) of a partition-certificate (Ij)j is the sum of j(Ij) the redundancies of the intervals composing it. • The redundancy ((Ij)jk) of an instance of the intersection problem is the minimal redundancy of a partition-certificate of the instance, min{((Ij)j), (Ij)j}
Unbounded search • looks for an element x in a sorted array A • unknown size • starting at position init • returns value p such that A[p-1]< x ≦A[p] the insertion point of x in A.
Unbounded search • It can be implemented using: • Doubling search • Binary search • Complexity 2
The algorithm for all i do pi 1 end for I ø ; s 1 repeat m As[ps] #NO 0; #YES 1; while #YES < k and #NO = 0 do Let As be a random array s.t. As[ps] ‡m. ps Unbounded Search (m,As, ps) if Ai[pi] ‡ m then #NO 1 else #YES #YES + 1 end if end while if #YES = k then II U{m} end if for all i such that Ai[pi] = m do pipi + 1 end for until m = +∞ return I
Theorem 1 • Thm 1: Algorithm rand intersection performs on average O(ρΣlog(ni/ρ)) comparisons on an instance of signature (k, n1,…,nk) and of redundancy ρ
Proof(1) • : #(binary searches) during phase i in array Aj • :#(binary searches) in array Aj over whole execution • 1 , if I is a singleton #{i, Ai∩ Ii= }, otherwise • If m I then =1 • If m I then is a random variable
Randomized Complexity Lower Bound • Yao’s Minimax Principle
Lemma 1 • For any k2, and 0<n1…nk, there is a distribution on instances of the Intersection problem with signature at most (k,n1,…,nk), and redundancy at most 4, such that any deterministic algorithm performs at least comparisons on average
At most = 4, and output size at most 1 P N A1 Aw Ak
Input distribution • Let Fi = log2 (2ni + 1) • F = • i each pi is chosen uniformly at random in {1,…,ni}. • An index w ∈ {2,…,k} equal to i with probability .
Case P: Aw[pw] = A1[1] • The redundancy of such instances is no more than 4.
Case N: Aw[pw] > A1[1] • The redundancy of such instances is no more than 4.
x-comps vs. comps between any element & x • Any algorithm performing C comparisons between arbitrary elements can be expressed as an algorithm performing no more than 2C x-comparisons.
x-comps vs. comps between any element & x • Any lower bound L on the complexity of algorithms using only x-comparisons is a L/2 lower bound on the complexity of algorithm using comparisons between arbitrary elements
Random variable Xi • Let Xi The number of x-comparisons performed by algorithms in array Ai for both P or N
Random variable Yi • Let Yi The number of x-comparisons performed by algorithms in array Ai for N
ζi • Let ζi be the indicator variable which equals 1 exactly if pi has been determined by algorithm on instance P.
C = • C • E (Yiζi) = Σh Pr {Yiζi h} • E (Yiζi) Pr {Yiζi h}
Pr {a ∨ b} Pr {a} + Pr {b} • Pr {Yiζi h} = Pr {Yi h ∧ζi = 1} = 1 – Pr {Yi < h ∨ζi = 0} 1 - Pr {Yi < h} – Pr {ζi = 0} = Pr {ζi = 1} - Pr {Yi < h}
Pr {Yi < h} (2h) / (2ni + 1) (2h-Fi) • Pr {Yi < h} = (all of the positions could be investigated after h times of binary searches (Searchi(h))) / (all of the positions the x could be present in Ai list (Presenti))
Searchi (h) • Searchi (h) 2h
Presenti • Presenti = 2ni + 1
E(C) • E(C) E(Yiζi ) Fi Pr{ζi = 1} - 2(1 – 2-Fi) Fi Pr{ζi = 1} + 2 2-Fi– 2 (k - 2)
Pr{ζi = 1 | p} =j:j i Fj/F • Let’s fix p = (p2,…,pk). There are only k – 1 possible choices for w. Algorithm A can only differentiate between P and N when it finds w. Let denote the order in which these instances are dealt with by A for p fixed. Thenζi = 1 iff i w .
Pr{ζi = 1} • Pr{ζi = 1} = Pr{{ζi = 1 | p} Pr{p}} = Pr{p} • FiPr{ζi = 1} = Pr{p} = Pr{p}
( Fi)2 • ( Fi)2 = 2 FiFj - Fi2, • FiFj =
E(C) • Lemma 1: proved
Lemma 2 • For any k ≧ 2, 0 < n1≦ . . . ≦ nk and ρ {4, . . . ,4n1}, there is a distribution on instances of the Intersection problem of signature at most (k,n1, . . . ,nk), and redundancy at most ρ, such that any deterministic algorithm performs on average Ω(ρ log(ni/ρ)) comparisons.
p sub-instances • For p= • p sub-instances,(Pj,Nj)j {1,,,,p} ,of signature (k, ,…, )from the distribution of lemma 1 • ρ≦4n1 ,p ≦ n1 and > 0 • All the arrays are positive
Random choosing • Let’s choose uniformly at random each sub-instance Ij between the positive sub-instance Pj and the negative sub-instance Nj. • They form a larger instance I by unifying the arrays of same index from each sub-instance. • The elements from two different sub-instances never interleave.
p elementary instances unified to form a single large instance
United Instances • Redundancy at most 4p≦ρ • Singnature at most (k,n1,…,nk) • Solving this instance implies to solve all the p sub-instances. • From Lemma 1
p sub problems • A lower bound of • which is