510 likes | 653 Views
A Randomized Linear-Time Algorithm to Find Minimum Spaning Trees. 黃則翰 R96922141 蘇承祖 R96922077 張紘睿 R96922136 許智程 D95922022 戴于晉 R96922171. David R. Karger Philip N. Klein Robert E. Tarjan. Outline. Introduction Basic Property & Definition Algorithm Analysis. Outline. Introduction
E N D
A Randomized Linear-Time Algorithm to Find Minimum Spaning Trees 黃則翰 R96922141蘇承祖 R96922077張紘睿 R96922136許智程 D95922022戴于晉 R96922171 David R. Karger Philip N. KleinRobert E. Tarjan
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Introduction • [Borůvka 1962] O(m log n) • Gabow et al.[1984] O(m log β(m,n) ) • β(m,n)= min { i |log(i)n <= m/n} • Verification algorithm • King[1993] O(m) • A randomize algorithm runs in O(m) time with high probability
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Cycle property • For any cycle C in a graph, the heaviest edge in C dose not appear in the minimum spanning forest. 3 3 2 2 5 5 6 6
Cut Property • For any proper nonempty subset X of the vertices, the lightest edge with exactly one endpoint in X belongs to the minimum spanning tree X 5 2 3 6
Definition • Let G be a graph with weighted edges. • w(x,y) • The weight of edge {x,y} • If F is a forest of a subgraph in G • F(x, y) • the path (if any) connecting x and y in F • wF(x, y) • the maximum weight of an edge on F(x, y) • wF(x, y)=∞ • If x and y are not connected in F
F-heavy & F-light • An edge {x,y} is F-heavy if w(x,y) > wF(x,y) and F-light otherwise • Edge of F are all F-light 3 3 E A C G W(C,D)=5WF(C,D)=5F-light 2 2 5 5 D H B F 6 6 W(B,D)=6WF(B,D)=max{2,3,5}F-heavy W(F,H)=6WF(F,H)= ∞ F-light
Observation • No F-heavy edge can be in the minimum spanning forest of G (cycle property) • Discard edge that cannot be in the minimum spanning tree • F-light edge can be the candidate edge for the minimum spanning tree of G
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Boruvka Algorithm • For each vertex, select the minimum-weight edge incident to the vertex. • Replace by a single vertex each connected component defined by the selected edges. • Delete all resulting isolated vertices, loops, and all but the lowest-weight edge among each set of multiple edges.
Algorithm Step1 • Apply two successive Boruvka steps to the graph, thereby reducing the number of vertices by at least a factor of four.
Algorithm Step2 • Choose a subgraph H by selecting each edge independently with probability ½. • Apply the algorithm recursively to H, producing a minimum spanning forest F of H. • Find all the F-heavy edges and delete them from the contracted graph.
Algorithm Step3 • Apply the algorithm recursively to the remaining graph to compute a spanning forest F’. Return those edges contracted in Step1 together with the edges of F’.
Original Problem G Boruvka × 2 G* F’ Sample with p=0.5 Left Sub-problem H G’ Right Sub-problem Return minimum forest F of H Delete F-heavy edges from G*
Correctness • By the cut property, every edge contracted during Step1 is in the MSF. • By the cycle property, the edges deleted in Step2 do NOT belong to the MSF. • By the induction hypothesis, the MSF of the remaining graph is correctly determined in the recursive call of Step3.
Candidate Edge of MST • The expected number of F-light edges in G is at most n/p (negative binomial) • For every sample graph H, the expected candidate edge for MST in G is at most n/p (F-light edge)
Random-sampling • To help discard some edge that cannot be in the minimum spanning tree • Construct the sample graph H • Process the edges in increasing order • To process an edge e • 1. Test whether both endpoints of e in same component • 2. Include the edge in H with probability p • 3. If e is in H and is F-light, add e to the Forest F
Random-sampling W(E,G)=14WF(E,G)=max{5,6,9,13}F-heavy W(E,F)=11WF(E,F)=max{5,6,9}F-heavy G H 4 4 14 14 10 10 5 5 7 C E C E 13 13 3 3 7 B 6 6 A G A G 11 11 B W(D,F)=9WF(D,F)=9F-light F F F D D 9 9 W(A,B)=7WF(A,B)=∞F-light
Random-sampling • Increasing Order • If F-light • Throw • If • Select • Else • Throw • Don’t select G 5 C E 4 14 6 10 A G 11 13 3 F D 9 F 7 B Random select edges to H Find F of H G 5 C E 4 14 6 10 A G 11 13 3 F D 9 7 B
Observation • No F-heavy edge can be in the minimum spanning forest of G (cycle property) • F-light edge can be the candidate edge for the minimum spanning tree of G • The forest F produced is the forest that would be produced by Kruskal and inlcude all possible MSF of G
Observation • The size of F is at most n-1 • The expected number of F-light edges in G is at most n/p (negative binomial) Mean k = Expected n =
Outline • Introduction • Basic Property & Definition • Algorithm • Analysis
Analysis of the Algorithm • The worst case. • The expectations running time. • The probability of the expectations running time.
Running time Analysis • Total running time= running time in each steps. • Step(1): 2 steps Boruvka’s algorithm • Step(2):Dixon-Rauch-Tarjan verification algorithm. • All takes linear time to the number of edges. • Estimate the total number of edges.
Observe the recursion tree • G=(V,E) |V| = n, |E|=m . • m≧n/2 since there is no isolate vertices. • Each problem generates at most 2 subproblems. • At depth d, there is at most 2d nodes. • Each node in depth d has at most n/4d vertices. • The depth d is at most log4n. • There are at most vertices in all subproblems
The worst case • Theorem 4.1 The worst-case running time of the minimum-spanning-forest algorithm is O(min{n2,m log n}), the same as the bound for Boruvka’s algorithm. • Proof: There is two different estimate ways. • A subproblem at depth contains at most (n/4d)2/2 edges. • Total edges in all subproblems is:
The worst case 2. Consider a subprolbem G=(V,E) after step(1), we have a G’=(V’ ,E’),|E’|≦|E| - |V|/2, |V’| ≦|V|/4 Edges in left-child = |H| Edges in right-child ≦ |E’| - |H| + |F| so edges in two subproblem is less then: (|H|) + (|E’| - |H| + |F|) =|E’| +|F|≦|E|-|V|/2 + |V|/4≦|E| The two sub problem at most contains |E| edges.
The worst case m edges
The worst case • The depth is at most log4n and each level has at most m edges, so there are at most (m log n) edges. • The worst-case running time of the minimum-spanning-forest algorithm is O(min{n2,m log n}).
Analysis of the Algorithm • The worst case. • The expectations running time. • The probability of the expectations running time.
Analysis – Average Case (1/8) • Theorem: the expected running time of the minimum spanning forest algorithm is O(m) • Calculating the expected total number of edges for all left path problems Original Problem Right Sub-problem Left Sub-problem Left Subsub-problem Right Subsub-problem
Analysis – Average Case (2/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges • Evaluating the total edge number for all right sub-problems # of edges = m’ Expected total edge number ≤2m’
Analysis – Average Case (3/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges 1. E[edge number of H] = 0.5 × edge number of G* Original Problem G 2. ∵ Boruvka × 2 ∴ edge number of G* ≤ edge number of G Boruvka × 2 G* Sample with p=0.5 E[edge number of H] ≤ 0.5 × edge number of G H G’ Right Sub-problem Left Sub-problem
Analysis – Average Case (4/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges Original Problem G Boruvka × 2 G* # of edges = m’ Sample with p=0.5 E[edge number of H] ≤ 0.5 × edge number of G H G’ # of edges ≤ 0.5 × m’ Right Sub-problem Left Sub-problem Expected total edge number ≤ = 2m’
Analysis – Average Case (5/8) • Calculating the expected total edge number for one left pathL started at one problem with m’ edges • Expected total edge number on L≤2m’ K.O. • Evaluating the total edge number of all right sub-problems • E[total edges of all right sub-problem] ≤ n
Analysis – Average Case (6/8) • Evaluating the total edge number for all right sub-problems • To prove :E[total edges of all right sub-problem] ≤ n 1. ∵ Boruvka × 2 ∴ vertex number of G* ≤ 0.25 × vertex number of G Original Problem G Boruvka × 2 2. Based on lemma 2.1: E[edge number of G’]≤2 × vertex number of G* G* Sample with p=0.5 E[edge number of G’]≤ 0.5×vertex number of G H G’ Right Sub-problem Left Sub-problem Return minimum forest F of H Delete F-heavy edges from G*
Analysis – Average Case (7/8) • Evaluating the total edge number for all right sub-problems • To prove :E[total edges of all right sub-problem] ≤ n Original Problem = n G Boruvka × 2 G* # of vertices of original-problems=n Sample with p=0.5 # of vertices of sub-problems ≤ 2×n/4 # of edges of right sub-problems≤n/2 E[edge number of G’]≤ 0.5×vertex number of G H G’ # of vertices of sub-problems ≤ 4×n/42 # of edges of right sub-problems≤2×n/8 Right Sub-problem Left Sub-problem # of edges of right sub-problems≤4×n/(42×2) # of vertices of sub-problems ≤ 8×n/43 # of edges of right sub-problems≤8×n/(43×2) # of vertices of sub-problems ≤ 16×n/44
Analysis – Average Case (8/8) • Calculating the expected total edge number for one left path started at one problem with m’ edges • Expected total edge number for one left path ≤2m’ • Evaluating the total edge number for all right sub-problems • E[total edges of all right sub-problem] ≤ n # of edges = m’ E[processed edges in the original problem and all sub-problems] =2×(m+n) Expected total edge number ≤2m’
Analysis of the Algorithm • The worst case. • The expectations running time. • The probability of the expectations running time.
The Probability of Linearity • Theorem 4.3 • The minimum spanning forest algorithm runs in Ο(m) time with probability 1 – exp(-Ω(m))
The Probability of Linearity Chernoff Bound: Given xi as i.d.d. random variables and 0< i n, and X is the sum of all xi, for t > 0, we have Thus, the probability that less than s successes (each with chance p) within k trials is
The Probability of Linearity • Right Subproblems • At most the number of vertices in all right subproblems: n/2 ( proved by theorem 4.2 ) • n/2 is the upper bound on the total number of heads in nickel-flips
Right Subproblems • The probability • It occurs fewer than n/2 heads in a sequence of 3m nickel-tosses • m + n ≦ 3m since n/2 ≦ m • The probability is exp (-Ω(m)) by a Chernoff bound
The Probability of Linearity • Left Subproblem • Sequence: every sequence ends up with a tail, that is, HH…HHT • The number of occurrences of tails is at most the number of sequences • Assume that there are at most m’ edges in the root problem and in all right subproblems
Left Subproblems The probability • It occurs m’ tails in a sequence of more than 3m’ coin-tosses • The probability is exp (-Ω(m)) by a Chernoff bound
The Probability of Linearity • Combining Right & Left Subproblems • The total number of edges is Ο(m) with a high-probability bound 1 – exp(-Ω(m))