Intro to Computation & AI

Intro to Computation & AI Dr. Jill Fain Lehman School of Computer Science Lecture 4: November 13, 1997

v4 v2 e5 e2 v5 e1 e8 e3 v1 e6 e7 v6 e4 v3 Graph Basics • In general a graph consists of a set of nodes/vertices V, and set of edges E • Note: a tree is a special type of graph. • Either v  V or e  E may be a complex structure with additional information associated with it.

pgh nyc 5 la 1.5 1 1.5 2 bos 2.5 5 3.5 sf no Examples noun ‘s noun noun noun noun article adj verb verb

Graph Formalism • G = (V, E) where G is a graph, V a set of vertices and E a set of edges, such that e  E iff e = (v1, v2), v1, v2 V. • If G is undirected, then e = (v1, v2) implies e = (v2, v1), i.e. vertices are unordered. • If G is directed (digraph) then (v1, v2) are ordered. v1 is the origin, v2 is the terminus or destination. v1 v2 v1 v2

B A C D Paths, Adjacency, Cycles • Two vertices vi and vj are adjacent if there exists an edge e  E such that e = (vi, vj). • A path p is a sequence of vertices of V of the form p = v1 v2 ... vn (n >= 2) in which each vertex vi is adjacent to vi+1 (for 1<= i <= n-1). • A cycle is a path p = v1 v2 ... vn such that v1 = vn

Connectivity • If x e V and y e V, x = y, then x and y are connected if there exists a path p = v1…vnsuch that x = v1 and y = vn. • For G undirected, a subset S of V is a connected component if for any two distinct vertices, x e S, y e S, x is connected to y. • For G directed, a subset S of V is strongly connected if for each pair of distinct vertices (vi,vj) e S, vi is connected to vj and vj is connected to vi. S is weakly connected if either vi is connected to vj or vj is connected to vi.

Connectivity Examples Strongly connected Weakly connected

Adjacency Sets and Degrees • Let an adjacency set Vx = {y | (x, y) e E}. Then G = (V, A) where A = {Vx | x e V}. • For G undirected, the degree of a vertex x is the number of edges e in which x is one of the endpoints of e. d=4 d=3 undirected graph with 2 components d=1 d=4 d=0 d=2

Degrees for Directed Graphs • If x is a vertex in a digraph G, we can define two sets Pred(x) and Succ(x), the predecessors and successors of x respectively. • Pred(x) = {y | y e V and (y, x) e E}; the size of Pred(x) is called the in-degree of x. • Succ(x) = {y | y e V and (x,y) e E}; the size of Succ(x) is called the out-degree of x. in=0; out=2 in=2, out=0 in=1, out=1

Graph Representations: The Adjacency Matrix • Given G=(V,E), V=v1…vn. Let T[i,j] be a table with n rows and n columns such that row i corresponds to vi and column j to vj, (1 <= i,j <= n). Then T[i,j] = 1 iff there exists e e E such that e = (vi,vj) and T[i,j] = 0 iff there exists no e e E such that e = (vi,vj) . 1 2 3 4 1 1 2 3 4 0 1 0 0 0 0 1 1 1 0 0 1 1 0 0 0 2 Adjacency matrix for G G 3 4

1: 2: 3: 4: 3 1 2 4 4 1 4 4 2 Graph Representations: Edge lists 1 G 2 3 4 G Vector of linked adjacency lists for G G 1 List of linked adjacency lists for G (basic graphnode:= name, nextv, edgelist) 2 3 3 1 4 1

Graph Insertion Given G: a list of graphnodes v: a graphnode edge: a pair of graphnodes, x and y And assume listinsert inserts only if not there. InsertEdge(edge, graph) listinsert(edge.x, graph) listinsert(edge.y, graph) listinsert(edge.y.name, edge.x.edgelist) return graph Complexity???

Complexity of Simple Graph Insertion InsertEdge(edge, graph) listinsert(edge.x, graph) O(|V|) listinsert(edge.y, graph) O(|V|) listinsert(edge.y.name, edge.x.edgelist) O(|V|) return graph Complexity: O(V) on each call How many calls? At most V2 So, O(V3) to build a graph

Example Main() { G := null For e in ((ny pgh)(ny bos)(bos pgh)) do G := InsertEdge(e, G)} G ny pgh bos pgh bos pgh

Graph Search • Basic idea: to search a graph G, we want to visit all G’s vertices in a systematic order (we’ll use the adjacency list). • Will need to designate some v e V as the start vertex. • Will need to mark each vertex we’ve visited as seen in order to detect cycles; so we add the field visited (boolean) to the basic graphnode definition.

Recursive DFS ExhaustiveDFS(v) { v.visited := true for w in v.edgelist do if w.visited = false then ExhaustiveDFS(w)} main() { ExhaustiveDFS(v0)} What if G has multiple components, or G has one component but is weakly connected?

Example B EDFS(A) A.visited := true for unvisited w in (B C D) do EDFS(B) D A C

Example B EDFS(A) A.visited := true for unvisited w in (B C D) do D A C EDFS(B) B.visited := true for unvisited w in (A C D) do EDFS(C)

Example B EDFS(A) A.visited := true for unvisited w in (B C D) do D A C EDFS(B) B.visited := true for unvisited w in (A C D) do EDFS(C) C.visited := true for unvisited w in (A B D) do EDFS(D)

Example B EDFS(A) A.visited := true for unvisited w in (B C D) do A D C EDFS(B) B.visited := true for unvisited w in (A C D) do EDFS(C) C.visited := true for unvisited w in (A B D) do EDFS(D) D.visited := true Nounvisited w in (B C A) so function returns

Example B EDFS(A) A.visited := true for unvisited w in (B C D) do A D C EDFS(B) B.visited := true for unvisited w in (A C D) do EDFS(C) C.visited := true No unvisited w so return

Example EDFS(A) A.visited := true for unvisited w in (B C D) do B A D EDFS(B) B.visited := true No unvisited w in (D) so return C

Example B EDFS(A) A.visited := true No unvisited w in (C D) so return A D C How would you change EDFS to visit nodes “breadth first”? ExhaustiveDFS(v) { v.visited := true for w in v.edgelist do if w.visited = false then ExhaustiveDFS(w)}

Shortest Path • For many problems the best representation is a directed graph with weighted edges representing, e.g., distance, time, cost. • Dijkstra’s shortest path algorithm finds the lowest cost path in O(n2). 2 3 Simulate by hand Write pseudocode Go to TA hours 1 6 24 Read assignment Write/debug Java 1 Go to TA hours 16

PERT/CPM • Project Evaluation and Review (PERT) charts use a graph to encode : • tasks as vertices • dependencies among paths as edges • duration of task as weight on edge • A critical path on a PERT chart is a path from a start vertex to an end vertex such that if the completion time of any task along p slips by DT then the project also slips by DT. • PERT/CPM uses a DAG and topological ordering.

The Travelling Salesman Problem (TSP) • Given G, a directed graph with weighted edges, where vertices represent cities, and weights on edges connecting cities give the distance/cost of traveling between those cities. • Problem: Find the minimum cost cycle that visits all the cities in the graph exactly once before returning to the starting point. • The number of possible paths is exponential; can we do better than exhaustively trying all paths?

The Class P • P is the class of all problems that can be solved in polynomial time on a deterministic computer. • Polynomial means O(nk) for some integer k given a problem of size n. • A deterministic computer makes exactly one choice at any choice point. • All single processor machines and machines with fixed parallelism are deterministic.

The Class NP • NP is the class of all problems that can be solved in polynomial time on a nondeterministic computer. • A nondeterministic computer always makes the correct choice at a choice point (one choice but never backs up). • Alternatively: a nondeterministic computer makes k copies of itself to run in parallel at a k-wise choice point, for all values of k. • Alternatively: a nondeterministic computer can explore a tree of depth d in O(d) time.

Instant Ph.D. Just answer the question: Does P = NP? (Nobody knows)

NP-Completeness • An NP-complete problem is one that can be solved in O(nk) on a nondeterministic machine, and for which it can be shown that every problem in NP can be reduced to the NP-complete problem using a polynomial time transformation. • Such proofs rely on the definition of Turing Machine. • Concept of NP-completeness is important because: • Showing a polynomial deterministic solution for any NP-complete problem means P = NP. • Proving something is NP-complete (or NP-hard) means you’re not likely to find a polynomial algorithm.

Proving a Problem is in NP • Another way to show your problem is NP-complete is to show that a known NP-complete problem can be reduced to it in polynomial time. • E.g. The Hamiltonian Circuit problem is known to be NP-complete (find a cycle in a directed graph of n vertices that travels through each vertex exactly once and returns to the start). • Let’s prove (very informally) that TSP is NP-complete.

Step 1: Show TSP is in NP • Show TSP in NP by giving nondeterministic solution: • Nondeterministically guess all possible subsets of |V| vertices and choose the one with minimum cost.

Step 2: Reduce Hamiltonian Circuit to TSP • Given a graph G = (V, E) we turn it into GTSP by adding a weight of 1 to each edge. • Run our nondeterministic TSP algorithm seeking a path of cost |V|. • GTSP has a solution iff G has a Hamiltonian Circuit. 1 a a b b 1 1 1 1 1 d c c d

Step 3: Proof by Contradiction • Now assume TSP is not NP-complete. • Then we can solve any instance of HC in polynomial time (just convert to TSP, run and read off answer). So HC is in P. • But we know that HC is NP-complete (contradiction). • Thus our assumption must be wrong and TSP is NP-complete.

Who Cares? • Just because you can only think of an exponential solution to a problem doesn’t mean that there isn’t a polynomial time solution (remember the mutilated checkerboard?). • If a problem is in P it is also in NP by definition (similarly, if it’s O(n2) it’s also O(n3), etc.) • Reduction of a known NP-complete problem guarantees that there is no polytime solution unless P = NP.

What do you do with an NP-complete problem? • Don’t bother looking for a polynomial time solution; go directly to heuristic search….

Intro to Computation & AI