360 likes | 458 Views
Graphs and Finding your way in the wilderness. Chapter 14 in DS&PS Chapter 9 in DS&AA. General Problems. What is the shortest path from A to B? What is the shortest path from A to all nodes? What is the shortest/cheapest path between any two nodes?.
E N D
Graphs andFinding your way in the wilderness Chapter 14 in DS&PS Chapter 9 in DS&AA
General Problems • What is the shortest path from A to B? • What is the shortest path from A to all nodes? • What is the shortest/cheapest path between any two nodes?. • Search for Goal node, i.e. a node with specific properties, like a win in chess. • What is shortest tour? (visit all vertices) • no known polynomial algorithm in number of edges • What is longest path from A to B
Some Applications • Route Finding • metacrawler, on net, claims to find shortest path between two points • Game-Playing • great increase in chess/checkers end-game play occurred when recognized as graph search, not tree search • Critical Path Analysis • multiperson/task job schedule analysis • answers what are the key tasks that can’t slip • Travel arrangement • cheapest cost to meet constraints
Definitions • Graph: set of edges E and vertices V. • Edge is a pair of vertices (v,w) • edge may be directed or undirected • edge may have a cost • v and w are said to be adjacent • Digraph or directed graph: directed edges • Path: sequence of vertices v1,…vn where each <vi,vi+1> is an edge. • Cycle: path where v1=vn • Tour: cycle that contains every vertex • DAG: directed acyclic graph • graph with no cycles • Tree algorithms usually work with DAGs just fine
Adjacency Matrix Representation • Matrix A = new Boolean(|V|,|V|). • O(|V|^2) memory costs: acceptable only if dense • If <vi,vj> is an edge, set A[i][j] = true, else false • Special matrix multiple operator: • row[i] @ col[j] = row[i][1]&col[1][j] or row[i][2]&col[2][j]….. • In A@A, if entry [i][j] is true, what does that mean? • There is some k so that vi->vk and vk->vj or a length 2 path from vi to vk. • Similarly, A^k indicates where any two vertices are connected by a length k path. • Cost: O(k*n^3).
Matrix Review • If A has n rows and k columns and • B has k rows and m columns then A*B = C • Where C has n rows and m columns and (standard) • C[i][j] = A[i][1]*B[1][j]+….+A[i][k]*B[k][j] • or i-j entry of C is dot product of row i of A and column J of B. Totaltime cost is O(n*k*m). • Example: matrix of size 3-3 represents a linear transformation from R^3 into R^3. • Points in R^3 represented as a 3 by 1 vector (column) • Essentially matrices stretch/shrink or rotate points. • Determinant defines amount of stretch/shrink. • Theorem: Locally every differentiable function can be approximated by a matrix (linear transformation).
Cost Matrix Representation • Now A[i][j] = cost of edge from vi to vj. • If no edge, either set cost to Infinity or add Boolean attribute to indicate no edge. • New multiplication operation • row[i]@col[j] = min { row[i][1]+col[1][j], row[i][2]+col[2][j],… row[i][n] +col[n][j] } • Now A^2 contains minimum cost path of length 2 between any 2 vertices. • A^n has complexity O(n^4). Not good. • If add A[i][i]= 0, then A^k records minimum cost path of length = k. (how to change to allow all paths <= k)
Adjacency List Representation • Here each vertex is the head of linked list which stores all the adjacent vertices. The cost to a node, if appropriate is also stored. • The linked lists are usually stored in an array, since you probably know how many vertices there are. • Memory cost = O(|E|) so if |E| << |V|^2, use list representation. • Graph is sparse if |E| is O(|V|). • To simplify the discussion, we will assume that each vertices has a method *sons* which returns all adjacent vertices. • Now, will redo same problems with list representation.
General Graph Search AlgorithmLooking for a node with a property Set Store equal to the some node while ( Store is non-empty) do choose a node n in Store if n is solution, stop • Decisions: else add SOME sons of n to store • What should store be? • How do we choose a node? • what does add mean? • How do pick which sons to store. • Cycles are a problem
Problem: Is Graph connected? (matrix rep) Note: n by n boolean matrix where n is number of vertices. Set A[i][i] to true Set A[i][j] to true if there is an edge between i and j. Let B= A^2, using boolean arithmetic Note B[i][j] is true iff there is a k such that B[i][k] is true and B[k][j] is true, i.e if there is a 2-path from i to j. A^k represents whether a k-path exists between any vertices. Let C = boolean sum of A^i where i= 1…n-1. (why?) Graph connected if C is all ones. Time complexity: O(N^4)! How about directed graphs? Basically the same algorithm. For directed graphs, strongly connected means directed path between any two vertices.
Is Undirected Graph G Connected? (adjacency list representation) • Suppose G is an undirected graph with N vertices. Let S be any node Do a (depth/breadth) first search of G, counting the number of nodes. Be careful not to double count. If number of nodes does not equal N, disconnected. • Searching a Graph is like searching a tree, except that nodes may be revisited. • Need to keep track of revisits, else infinite loop.
Depth First Search Pseudo-Code • Store = Stack • Choose = pop • Initial node: any node • Add = push only new sons (unvisited ones) • keep a boolean field visited, initialized to false. • When a node is “popped”, mark it as visitied. • Graph connected if all nodes visited. • Properties • Memory cost: number of nodes • Guarantee to find a solution, if one exists (not shortest solution) How could we guarantee that? • Time: number of nodes (exponential for k-ary trees)
Breadth First Search • G is a undirected Graph • As before each node has a boolean visited field. • Initial node is arbitrary • Store = Queue • Choose = dequeue and mark as visited • Add = enqueue only those sons that have not been visited • Properties: • Time: Number of nodes to solution • Space: Number of nodes • Guaranteed to find shortest solution
Is Directed Graph Acyclic? (array represntation) • Let A[i][j] be true if there is a directed edge from i to j. • Similar to previous case, if B= A^2 with boolean multiplication, then B[i][j] is true iff there is a directed 2-path from i to j. • Algorithm: For i = 1 to n-1 (why?) Compute A^i. If some diagonal element is true, exit with true end for Exit with false.
Is Directed Graph Acyclic? (adjacency list rep) • With care, breadth first search works. • Define the indegree of a node v as the number of edges of the form (u,v), with u arbitrary. • Define the outdegree of a node v as the number of edges of the form (v, u) with u arbitrary. • A node with indegree 0 is like the root of a tree. • A node with outdegree 0 is a terminal node.
Breadth First Search Pseudo-code • Algorithm Idea (has numerous variations/implementations) • Store = Queue Compute indegree of all nodes Enqueue all nodes of indegree 0 While Queue is not empty Dequeue node n and lower indegrees of nodes of form (n,v) Enqueue any node whose indegree is 0. If any node still has positive indegree, then cyclic. Why does algorithm terminate? • Properties: • Time & Space: Number of nodes • find node closest to “roots”.
Best-First Search • Goal: find least cost solution • Here edges have a cost (positive) • Store = priority queue • Add = enqueue(), which puts in right order • Choose = dequeue(), chooses element of least cost • Properties: • Find cheapest solution • Time and Memory: exponential in… • depth of tree.
Best First Pseudo-Code Set distance from S to S to 0 Priority Queue PQ <- S while (PQ is not empty) vertex <- PQ.deque() sons <- vertex.sons() for each son in sons PQ.enqueue(son, cost to son)
Topological Sort • Given: a directed acyclic graph • Produce: a linear ordering of the vertices such that if a path exist from v1 to v2, then v1 is before v2. • If v1 is before v2, is there a path from v1 to v2? • NO • Note: there may be multiple correct topological sorts • Algorithm Idea: • any vertex with indegree 0 can be first • Output and delete that vertex • update indegree’s of its sons • Repeat until empty • So we need to compute and keep track of indegree’s
Algorithm Implementation • HashTable of (vertex, indegree, sons) • Queue of vertices • Step 1: read each edge (v,w) and add 1 to indegree of w • linear • Step 2: Add all vertices with indegree 0 to queue Q. • Step 3: Process Q by: • dequeue vertex • update indegrees of its sons (constant by hashing) • enqueue any son whose indegree become 0. • Time complexity: linear • Space: linear • Proof: Does everything get enqueued?
UnWeighted Single-Source Shortest path algorihtm • Input: Directed graph and start node S • Output: the minimum cost, in terms of number of edges traversed, from start node to all other nodes. • Idea: do a level order search (breadth-first search) • We’ll use a hashtable to mark elements as “seen”, • i.e. we’ll track vertices that we’ve visited • use hashtable to hold this information by “marking” vertices that have been visited • We’ll use a queue to store the vertices to be “opened” • to open a node means to consider its sons • Each vertex will have a field for distance to start node.
Shortest Path Algorithm Set distance from S to S to 0 queue <- S while (queue is not empty) vertex <- queue.dequeue() mark vertex as visited (enter in hashtable) record cost to vertex sons <- vertex.sons() newSons <- sons that are unmarked fill in distance measure to newSons queue.enqueue(newSons) Essentially, breadth-first search
Discussion • Will this terminate? • Will we ever revisit a node? • Computational cost? • O(|E|) • What are the memory requirements? • Suppose we don’t count graph (virtual graphs) • Can we bound queue? • Only O(|E|) and if m-ary tree, this is exponential. • Did we need the hashtable? • This avoids a linear search of the constructed graph • remove a factor of O(|G|).
Positive-Weighted Single-source Cheapest Path • Suppose we have positive costs associated with every edge in a directed graph. • Problem: Find the shortest path(total cost) from given vertex S to every vertex. • Solution: Dijstra’s algorithm • BFS idea still works, with slight modifications • Replace Queue by Priority queue. • As before, replace newSons by betterSons. • As before, replace add entry to update entry (which may be add) • Note: may reopen an old son( if return with better path) • Dense graphs: O(|V|^2), sparse graphs: O(|E|log|V|)
Weighted Shortest Path Pseudo-Code Set distance from S to S to 0 (on node) Priority Queue PQ <- S while (PQ is not empty) vertex <- PQ.dequeue() … remove min mark vertex as visited (enter in hashtable) record cost to vertex sons <- vertex.sons() goodSons <- new sons OR old sons with better costs estimates queue.enqueue(goodSons)… enqueue puts in proper order.
Graphs with negative edge costs • Dijsktra doesn’t work (since we may have cycles which lower the cost) • Input: Directed graph with arbitrary edge costs and vertex v. • Output: Minimum cost from S to every vertex OR graph has a negative cost cycle. • Note: If no negative cost cycles, then a vertex can be visited (expanded) at most |V| times. • Algorithm: Add counter to each vertex so each time it is visited with lower cost, counter goes up. If counter exceeds |V|, then graph has negative cost cycle and we exit. Otherwise queue will be empty.
Weighted Single-Source shortest-path problems for Acyclic graphs • Easy since no cycles • Edge costs may be positive or negative • Best-first search works • Node may be reentrant • Reentrant node require may required updating cost. • Or apply topological sorting algorithm. (text) 2 4 7 3 6 1
Algorithm Display • Idea: • Iterative use of breadth first search • addition of edges to effect other choices • See Diagrams provided • Analysis: (requires augmented path be cheapest) • Runs in linear time
Minimum Spanning Tree • Given: an undirected connected graph with edge costs • Output: a subtree of graph such that • contains all vertices • sum of costs of edges is minimum • If costs not given, assume 1. What then? • Note all spanning trees have same number of edges • Application: • Is undirected Graph with n vertices connected? • IFF minimal spanning tree has n-1 edges.
Prim’s Algorithm • Let G be given as (V,E) where V has n vertices • Let T = empty • Algorithm Idea: grow cheapest tree • Choose a random v to start and add to T • Repeat (until T has n vertices) • select edge of minimum length that does not form a cycle and that attaches to current tree (how to check?) • add edge to T • The proof is more difficult than the code. • Complexity depends on G and code • O(V^2) for dense graphs • O(E*log(V)) for sparse graphs (use binary heap)
Kruskal’s Algorithm • Given graph G = (V,E) • Sort edges on the basis of cost. • Add least cost edge to Forest, as long as no cycle is formed. • Cost of cycle checking is? • If implement as adjacency list, O(E^2) • If implement as hash table O(1) • Proof more difficult. • Time complexity: O(E log E)
Finding the least cost between pairs of points • Idea: Dynamic programming • Let c[i][j] be the edge cost between vi and vj. • Define C[i][j] as minimum cost for going from vi to vj. • Finding the subproblems • Suppose P is the path from vi to vj which realizes the minimum cost and vk is an intermediary node. • Then the subpaths from i to k and from k to j must be optimal, otherwise P would not be optimal. • Now define D[i][k][j] as the minimum cost for going from vi to vj using any of v1,v2..,vk as an intermediary. • Define D[i][j] as the minimum cost for going from i to j. • D[i][j] = min over k of D[i][k][j] and c[i][j].
All-Pairs (Floyd’s)Pseudo-Code • Initialization: • D[i][0][j] =cost(i,j) for all vertices i, j … O(|V|^2) • D[i][k+1][j] = • min(D[i][k][j], D[i][k][k+1]+D[k+1][k][j]) • This last statement is true since any path from the shortest path from vi to vj using {v1,…vk+1} either doesn’t use vk+1, or the path divides into a path from vi to vk+1 and one from vk+1 to vj. • The cost of this is O(|V|^3) - i.e. single loop over all vertices with |V|^2 per loop.
NP vs P • Multiple ways to define • Define new computational model (Imaginary) • add to programming language • choose S1, S2,….Sn; where Si are statements • Semantics: algorithm always chooses best Si to execute. • This is the Non-Deterministic model • If problem can be solve in polynomial time with non-deterministic it is in the class NP. • If problem can be solved in polynomial time on standard computer (deterministic) then in class P. • Unsolved (and possibly unsolvable) does NP = P?
NP-Completeness • A problem is in NP or NP-hard if it can be solved in polynomial time on a non-deterministic machine. • A problem p* is NP complete if any problem in NP can be polynomial reduced to p*. • A problem P1 can be polynomial reduced to P2 if P1 can be solved in polynomially time assuming that P2 can be solved in polynomially time. • Alternatively, if P1 can be transformed into P2 and solutions of P2 mapped back to P1 and all the transformations take polynomial time. • This is a way of forming a taxonomy of difficulty of various problems
NP-Complete problems • Boolean Satisfiability • Traveling Salesmen • Bin Packing: given packages of size a[1]…a[n] and bins of size k, what is the fewest numbers of bins needed to store all the packages. • Scheduling: Given tasks whose time take t[1]…t[n] and k processors, what is minimum completion time? • Graph: Given a graph find the clique of maximum size. • A clique is a completely connected subgraph. • Subset-sum: Given a finite set S of n numbers and a target number t, does some subset of S sum to t. • Vertex Cover: A vertex cover is a subset of vertices which hits every edge. The problem is to find a cover with the fewest number of vertices.