Introduction to Algorithms

Introduction to Algorithms Elementary Graph Algorithms My T. Thai @ UF

Outline • Graph representation • Graph-searching algorithms • Breadth-ﬁrst search • Depth-ﬁrst search • Topological sort • Strongly connected components My T. Thai mythai@cise.ufl.edu

Graphs • Given a graph G=(V, E) • V: set of vertices • E: set of edges • Types of graphs: • Undirected: • Directed: • Weighted: each edge e has a weight w(e) • Dense: • Sparse: |E| = o(V2) • |E| = O(|V|2) My T. Thai mythai@cise.ufl.edu

Representations of graphs • Two common ways to represent for algorithms: • Adjacency lists • Adjacency matrix My T. Thai mythai@cise.ufl.edu

Adjacency lists • Array Adj of |V| lists, one per vertex • List of vertex u, Adj[u], include all vertices v such that • If edges have weights, can put the weights in the lists My T. Thai mythai@cise.ufl.edu

Adjacency matrix • Store adjacent matrix |V| × |V| matrix A = (aij) • If edges have weights, aij is the weight of edge (i, j) My T. Thai mythai@cise.ufl.edu

Adjacency lists vs Adjacency matrix Adjacency lists Adjacency matrix • Space: • Time: to list all vertices adjacent to u: • Time: to determine if (u, v) ∈ E: My T. Thai mythai@cise.ufl.edu

Graph searching algorithms • Visiting all the nodes in a graph in a particular manner following the edges of the graph • Used to discover the structure of the graph • Two common searching algorithms: • Breadth-first search (BFS): visit the sibling nodes before visiting the child nodes (visit node layer by layer based on the distance to the root) • Depth-first search (DFS): visit the child nodes before visiting the sibling nodes (follow the “depth” of the graph) My T. Thai mythai@cise.ufl.edu

Breadth-first search • Input: Graph G = (V, E), either directed or undirected, and source vertex s ∈ V • Output: • d[v] = distance (smallest # of edges, or shortest path) from s to v, for all vV • such that (u, v)is last edge on shortest path s to v • u is v’s predecessor • Builds breadth-first tree with root s that contains all reachable vertices My T. Thai mythai@cise.ufl.edu

Breadth-first search • Discover vertices in a wave manner from the source s • First hit all vertices 1 edge from s, thenvertices 2 edges from s … • A vertex is discovered at the first time that it is encountered during the search. • A vertex is finished if all vertices adjacent to it have been discovered. • Colors the vertices to keep track of progress. • White: Undiscovered. • Gray: Discovered but not finished. • Black: Finished. My T. Thai mythai@cise.ufl.edu

Queue Q contains all discovered but not finished vertices – gray vertices My T. Thai mythai@cise.ufl.edu

Example My T. Thai mythai@cise.ufl.edu

My T. Thai mythai@cise.ufl.edu

Analysis of BFS • Initialization takes O(V) • In while loop • Each vertex is enqueued and dequeued at most once => total time for queuing is O(V) • The adjacency list of each vertex is scanned at most once. The sum of lengths of all adjacency lists is (E) =>Total running time:O(V+E)linear in the size of the adjacency list representation of the graph My T. Thai mythai@cise.ufl.edu

Correctness of BFS Where is the shortest distance from s to v Proof (sketched): • : • If t is the sequence of vertices in the queue, then • If v is reached from u where d.u = => My T. Thai mythai@cise.ufl.edu

Depth-first search • Input:G = (V, E), directed or undirected. No source vertex given. • Output: 2 timestamps on each vertex: • d[v] = discovery time • f [v] = finishing time • [v] : predecessor of v = u, such that v was discovered during the scan of u’s adjacency list • Build depth-first forest comprising several depth-first trees My T. Thai mythai@cise.ufl.edu

Depth-first search • From a vertex v • Follow an outgoing edge to explore • Keep searching as deep as possible • When all vertices reachable through the edge is explored, follow another outgoing edge of v • When all vertices reachable from the original source are discovered, choose an undiscovered vertex as a new source then repeat the process My T. Thai mythai@cise.ufl.edu

Depth-first search white: undiscovered gray: discovered black: finished My T. Thai mythai@cise.ufl.edu

My T. Thai mythai@cise.ufl.edu

Analysis of DFS • In DFS: loops on lines 1-2 & 5-7 take (V) time, excluding time to execute DFS-Visit • DFS-Visit is called once for each white vertex vV when it’s colored gray the first time. Lines 3-6 of DFS-Visit is executed |Adj[v]| times. The total cost of executing all DFS-Visit is vV|Adj[v]| = (E) • Total running time of DFS is(V+E) My T. Thai mythai@cise.ufl.edu

Parenthesis Theorem Theorem 22.7 For all u, v, exactly one of the following holds: 1. d[u] < f [u] < d[v] < f [v] or d[v] < f [v] < d[u] < f [u] and neither u nor vis a descendant of the other. 2. d[u] < d[v] < f [v] < f [u] and vis a descendant of u. 3. d[v] < d[u] < f [u] < f [v] and u is a descendant of v • d[u] < d[v] < f [u] < f [v] cannothappen • Like parentheses: • OK: ( ) [ ] ( [ ] ) [ ( ) ] • Not OK: ( [ ) ] [ ( ] ) Corollary 22.8 v is a proper descendant of u if and only if d[u]<d[v]<f [v]<f [u] My T. Thai mythai@cise.ufl.edu

Example of parenthesis theorem My T. Thai mythai@cise.ufl.edu

Proof of Parenthesis Theorem • W.l.g suppose d[u] < d[v], there are two cases: • d[v] < f[u] • v is discovered while u is gray  v is descendant of u • v is discovered more recently than u v is finished before the algorithm comes back and finishes u • Interval [d[v], f[v]] is included in [d[u], f[u]] • d[v] > f[u] • d[v] < f[v] • Two intervals [d[u], f[u]] and [d[v], f[v]] are disjoint My T. Thai mythai@cise.ufl.edu

White-path theorem Theorem 22.9 In a depth-first forest of a (directed or undirected) graph G =(V,E), vertex v is a descendant of vertex u if and only if at the time d[u] that the search discovers u, there is a path from u to v consisting entirely of white vertices. Proof: • => at time d[u], all vertices on the path from u to v, including v are descendant of u • Apply corollary 22.8, d[u] < d[v], these vertices are white at time d[u] • <= Suppose there is a white path from u to v but v is not a descendant of u • Let v’ the nearest vertex to u on the path s.t. v’ is not descendant of u • w is the prior vertex of v’ on the path  w is descendant of u  d[u] < d[w] < d[v’] < f[v’] < f[w]<f[u]  v’ is the descendant of u My T. Thai mythai@cise.ufl.edu

Classification of edges • Tree edge: in the depth-first forest. Found by exploring (u, v) • Back edge: (u, v), where u is a descendant of v • Forward edge: (u, v), where v is a descendant of u, but not a tree edge • Cross edge: any other edge. Can go between vertices in same depth-first tree or in different depth-first trees Theorem 22.10 In a depth-first search of an undirected graph G, every edge of G is either a tree edge or a back edge. My T. Thai mythai@cise.ufl.edu

Topological sort • Directed acyclic graph (DAG) • A directed graph with no cycles • Good for modeling processes and structures that have a partial order, edge (u, v) represent order u>v • Topology sort: make a total order of a given DAG such that vertex u appears before v if there is edge (u, v) My T. Thai mythai@cise.ufl.edu

Example Order for getting dressed My T. Thai mythai@cise.ufl.edu

Characterizing DAGs Lemma 2.11. A directed graph G is acyclic if and only if a depth-first search of G yields no back edges. Proof: • ⇒: Show that back edge ⇒cycle • Suppose there is a back edge (u, v) => v is ancestor of u in depth-first forest => there is a path from v to u => there is a cycle • ⇐: Show that cycle ⇒back edge • Suppose G contains cycle c. Let v be the first vertex discovered in c, and let (u, v) be the preceding edge in c. • At time d[v], vertices of c form a white path v to u • By white-path theorem, u is descendant of v in depth-first forest =>(u, v) is a back edge. My T. Thai mythai@cise.ufl.edu

TOPOLOGICAL-SORT • Running time:(V + E) My T. Thai mythai@cise.ufl.edu

Strongly connected components • Given directed graph G = (V, E), a strongly connected component (SCC) of G is a maximal set of vertices C ⊆ V such that for all u, v ∈ C, there are paths both from u to v and from v to u My T. Thai mythai@cise.ufl.edu

Component Graph • GSCC= (VSCC, ESCC). • VSCC has one vertex for each SCC in G • ESCC has an edge if there’s an edge between the corresponding SCC’s in G My T. Thai mythai@cise.ufl.edu

GSCC is a DAG Proof • Suppose there is a path vvin G • there are paths u u vand v vu in G • u and vare reachable from each other, so they are not in separate SCC’s. My T. Thai mythai@cise.ufl.edu

Transpose of a Directed Graph • GT= transposeof directed graph G • GT= (V, ET), ET= {(u, v): (v, u)E}. • GT is G with all edges reversed. • Can create GT in Θ(V + E)time if using adjacency lists. • Observation: G and GT have the same SCC’s. (u and vare reachable from each other in G if and only if reachable from each other in GT.) My T. Thai mythai@cise.ufl.edu

Algorithm to compute SCCs • Time:(V + E) My T. Thai mythai@cise.ufl.edu

How does it work? • Idea: • By considering vertices in second DFS in decreasing order of finishing times from first DFS, we are visiting vertices of the component graph in topologically sorted order. • Because we are running DFS on GT, we will not be visiting any v from u, where v and u are in different components. • Notation: • d[u] and f [u] always refer to first DFS. • Extend notation for d and f to sets of vertices U V: • d(U)= minuU{d[u]} (earliest discovery time) • f (U)= maxuU{ f [u]} (latest finishing time) My T. Thai mythai@cise.ufl.edu

SCCs and DFS finishing times Proof: • Case 1: d(C) < d(C) • Let x be the first vertex discovered in C. • At time d[x], all vertices in C and Care white. => there exist paths of white vertices from x to all vertices in C and C. • By the white-path theorem, all vertices in C and Care descendants of x in depth-first tree. • By the parenthesis theorem, f [x] = f (C) > f(C). My T. Thai mythai@cise.ufl.edu

SCCs and DFS finishing times Proof: • Case 2: d(C) > d(C) • Let y be the first vertex discovered in C • At time d[y], all vertices in Care white and there is a white path from y to each vertex in Call vertices in Cbecome descendants of y Again, f [y] = f (C). • At time d[y], all vertices in C are also white. • By earlier lemma, since there is an edge (u, v), we cannot have a path from Cto C => no vertex in C is reachable from y => at time f [y], all vertices in C are still white • => for all w C, f [w] > f [y] => f (C) > f (C) My T. Thai mythai@cise.ufl.edu

SCCs and DFS finishing times Proof: • (u, v)ET  (v, u)E • Since SCC’s of G and GT are the same, f(C) > f (C), by Lemma 22.14. My T. Thai mythai@cise.ufl.edu

Correctness • When we do the second DFS, on GT, start with SCC C such that f(C)is maximum. • The second DFS starts from some x C, and it visits all vertices in C. • Corollary 22.15 says that since f(C) > f (C)for all CC, there are no edges from C to Cin GT. => DFS will visit only vertices in C. => the depth-first tree rooted at x contains exactly the vertices of C. My T. Thai mythai@cise.ufl.edu

Correctness • The next root chosen in the second DFS is in SCC Csuch that f (C)is maximum over all SCC’s other than C. • DFS visits all vertices in C, but the only edges out of Cgo to C, which we’ve already visited. • Therefore, the only tree edges will be to vertices in C. • Each time we choose a root for the second DFS, it can reach only • vertices in its SCC—get tree edges to these, • vertices in SCC’s already visited in second DFS—get no tree edges to these. My T. Thai mythai@cise.ufl.edu

Summary • Two common ways to represent a graph: adjacency lists and adjacency matrix. Choosing representation depends on: • Type of graph: dense or sparse • Type of operation that is used frequently in the problem • adjacency lists saves more space than adjacency matrix • BFS starts from a given source and computes the shortest paths from the source to other vertices. Some vertices may be not discovered • BFS starts from an arbitrary vertex and discovers the whole graph to capture the graph’s structure My T. Thai mythai@cise.ufl.edu

Both BFS and DFS can provide a set of reachable vertices from a given vertex • Topology sort is used to form the total order from partial orders • Strongly connected components list all strongly connected components in their topology order • BFS, DFS, Topology Sort, Strongly Connected Component run in linear time in term of the graph’s size My T. Thai mythai@cise.ufl.edu

Introduction to Algorithms