Data Structures and Algorithms

Data Structures and Algorithms Graphs I: Representation and Search Gal A. Kaminka Computer Science Department

Outline • Reminder: Graphs • Directed and undirected • Matrix representation of graphs • Directed and undirected • Sparse matrices and sparse graphs • Adjacency list representation

Graphs • Tuple <V,E> • V is set of vertices • E is a binary relation on V • Each edge is a tuple < v1,v2 >, where v1,v2 in V • |E| =< |V|2

Directed and Undirected Graphs • Directed graph: • < v1, v2 > in E is ordered, i.e., a relation (v1,v2) • Undirected graph: • < v1, v2 > in E is un-ordered, i.e., a set { v1, v2 } • Degree of a node X: • Out-degree: number of edges < X, v2 > • In-degree: number of edges < v1, X > • Degree: In-degree + Out-degree • In undirected graph: number of edges { X, v2 }

Examples of graphs

Paths • Path from vertex v0 to vertex vk: • A sequence of vertices < v0, v1, …. vk > • For all 0 =< i < k, edge < vi, vi+1 > exists. • Path is of length k • Two vertices x, y are adjacent if < x, y > is an edge • Path is simple if vertices in sequence are distinct. • Cycle: if v0 = vk • < v, v > is cycle of length 1

Connected graphs • Undirected Connected graph: • For any vertices x, y there exists a path xy (= yx) • Directed connected graph: • If underlying undirected graph is connected • Strongly connected directed graph: • If for any two vertices x, y there exist path xy and path yx • Clique: a strongly connected component • |V|-1 =< |E| =< |V|2

Cycles and trees • Graph with no cycles: acyclic • Directed Acyclic Graph: DAG • Undirected forest: • Acyclic undirected graph • Tree: undirected acyclic connected graph • one connected component

Representing graphs • Adjacency matrix: • When graph is dense • |E| close to |V|2 • Adjacency lists: • When graph is sparse • |E| << |V|2

Adjacency Matrix • Matrix of size |V| x |V| • Each row (column) j correspond to a distinct vertex j • “1” in cell < i, j > if there is exists an edge <i,j> • Otherwise, “0” • In an undirected graph, “1” in <i,j> => “1” in <j,i> • “1” in <j,j> means there’s a self-loop in vertex j

Examples 1 1 2 1 2 3 1 0 0 1 2 0 1 0 3 1 1 0 3 2 3 4 1 2 3 4 1 0 1 1 0 2 1 0 0 0 3 1 0 0 0 4 0 0 0 0

Adjacency matrix features • Storage complexity: O(|V|2) • But can use bit-vector representation • Undirected graph: symmetric along main diagonal • AT transpose of A • Undirected: A=AT • In-degree of X: Sum along column X O(|V|) • Out-degree of X: Sum along row X O(|V|) • Very simple, good for small graphs • Edge existence query: O(1)

But, …. • Many graphs in practical problems are sparse • Not many edges --- not all pairs x,y have edge xy • Matrix representation demands too much memory • We want to reduce memory footprint • Use sparse matrix techniques

Adjacency List • An array Adj[ ] of size |V| • Each cell holds a list for associated vertex • Adj[u] is list of all vertices adjacent to u • List does not have to be sorted Undirected graphs: • Each edge is represented twice

Examples 1 1 2 1 3 2 2 3 1  2 3 2 3 4 1 2  3 2 1 3 1 4

Adjacency list features • Storage Complexity: • O(|V| + |E|) • In undirected graph: O(|V|+2*|E|) = O(|V|+|E|) • Edge query check: • O(|V|) in worst case • Degree of node X: • Out degree: Length of Adj[X] O(|V|) calculation • In degree: Check all Adj[] lists O(|V|+|E|) • Can be done in O(1) with some auxiliary information!

שאלות?

Graph Traversals (Search) • We have covered some of these with binary trees • Breadth-first search (BFS) • Depth-first search (DFS) • A traversal (search): • An algorithm for systematically exploring a graph • Visiting (all) vertices • Until finding a goal vertex or until no more vertices Only for connected graphs

Breadth-first search • One of the simplest algorithms • Also one of the most important • It forms the basis for MANY graph algorithms

BFS: Level-by-level traversal • Given a starting vertex s • Visit all vertices at increasing distance from s • Visit all vertices at distance k from s • Then visit all vertices at distance k+1 from s • Then ….

5 2 1 3 8 6 10 7 9 BFS in a binary tree (reminder) BFS: visit all siblings before their descendents 5 2 8 1 3 6 10 7 9

BFS(tree t) • q  new queue • enqueue(q, t) • while (not empty(q)) • curr  dequeue(q) • visit curr // e.g., print curr.datum • enqueue(q, curr.left) • enqueue(q, curr.right) This version for binary trees only!

BFS for general graphs • This version assumes vertices have two children • left, right • This is trivial to fix • But still no good for general graphs • It does not handle cycles Example.

A B G C E D F Queue: A Start with A. Put in the queue (marked red)

A B G C E D F Queue: A B E B and E are next

A B G C E D F Queue: A B E C G D F When we go to B, we put G and C in the queue When we go to E, we put D and F in the queue

A B G C E D F Queue: A B EC G D F F Suppose we now want to expand C. We put F in the queue again!

Generalizing BFS • Cycles: • We need to save auxiliary information • Each node needs to be marked • Visited: No need to be put on queue • Not visited: Put on queue when found What about assuming only two children vertices? • Need to put all adjacent vertices in queue

BFS(graph g, vertex s) • unmark all vertices in G • q  new queue • mark s • enqueue(q, s) • while (not empty(q)) • curr  dequeue(q) • visit curr // e.g., print its data • for each edge <curr, V> • if V is unmarked • mark V • enqueue(q, V)

The general BFS algorithm • Each vertex can be in one of three states: • Unmarked and not on queue • Marked and on queue • Marked and off queue • The algorithm moves vertices between these states

Handling vertices • Unmarked and not on queue: • Not reached yet • Marked and on queue: • Known, but adjacent vertices not visited yet (possibly) • Marked and off queue: • Known, all adjacent vertices on queue or done with

A B G C E D F Queue: A Start with A. Mark it.

A B G C E D F Queue: A B E Expand A’s adjacent vertices. Mark them and put them in queue.

A B G C E D F Queue: AB E C G Now take B off queue, and queue its neighbors.

A B G C E D F Queue: ABE C G D F Do same with E.

A B G C E D F Queue: ABEC G D F Visit C. Its neighbor F is already marked, so not queued.

A B G C E D F Queue: ABECG D F Visit G.

A B G C E D F Queue: ABECGD F Visit D. F, E marked so not queued.

A B G C E D F Queue: ABECGDF Visit F. E, D, C marked, so not queued again.

A B G C E D F Queue: ABECGDF Done. We have explored the graph in order: A B E C G D F.

Interesting features of BFS • Complexity: O(|V| + |E|) • All vertices put on queue exactly once • For each vertex on queue, we expand its edges • In other words, we traverse all edges once • BFS finds shortest path from s to each vertex • Shortest in terms of number of edges • Why does this work?

Depth-first search • Again, a simple and powerful algorithm • Given a starting vertex s • Pick an adjacent vertex, visit it. • Then visit one of its adjacent vertices • ….. • Until impossible, then backtrack, visit another

DFS(graph g, vertex s)Assume all vertices initially unmarked • mark s • visit s // e.g., print its data • for each edge <s, V> • if V is not marked • DFS(G, V)

A B G C E D F Current vertex: A Start with A. Mark it.

A B G C E D F Current: B Expand A’s adjacent vertices. Pick one (B). Mark it and re-visit.

A B G C E D F Current: C Now expand B, and visit its neighbor, C.

A B G C E D F Current: F Visit F. Pick one of its neighbors, E.

A B G C E D F Current: E E’s adjacent vertices are A, D and F. A and F are marked, so pick D.

A B G C E D F Current: D Visit D. No new vertices available. Backtrack to E. Backtrack to F. Backtrack to C. Backtrack to B

Data Structures and Algorithms