ITEC 2620M Introduction to Data Structures

ITEC 2620MIntroduction to Data Structures Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/itec2620m.htm Office: DB 3049

Graphs

Key Points • Graph Algorithms • Definitions, representations, analysis • Shortest paths • Minimum-cost spanning tree

Basic Definitions • A graph G = ( V, E ) consists of a set of vertices V and a set of edges E – each edge E connects a pair of vertices in V. • Graphs can be directed or undirected. • redraw above with arrows – first vertex is source • Graphs may be weighted. • redraw above with weights, combine definitions • A vertex vi is adjacent to another vertex vj if they are connected by an edge in E. These vertices are neighbors. • A path is a sequence of vertices in which each vertex is adjacent to its predecessor and successor. • The length of a path is the number of edges in it. • The cost of a path is the sum of edge weights in the path

Basic Definitions (Cont’d) • A cycle is a path of length greater than one that begins and ends at the same vertex. • A simple cycle is a cycle of length greater than three that does not visit any vertex (except the start/finish) more than once. • Two vertices are connected if there as a path between them. • A subset of vertices S is a connected component of G if there is a path from each vertex vi to every other distinct vertex vj in S. • The degree of a vertex is the number of edges incident to it. – the number of vertices that it is connected to • A graph is acyclic if it has no cycles (e.g. a tree) . • A directed acyclic graph is called a DAG or digraph

Representations • The adjacency matrix of graph G = ( V, E ) for vertices numbered 0 to n-1 is an n x n matrix M where M[i][j] is 1 if there is an edge from vi to vj, and 0 otherwise. • The adjacency list of graph G = ( V, E ) for vertices numbered 0 to n-1 consists of an array of n linked lists. The ith linked list includes the node j if there is an edge from vi to vj. • Example

Comparisons and Analysis • Space • adjacency matrix uses O( ) space (constant) • adjacency list uses O(|V| + |E|) space (note: pointer overhead) • better for sparse graphs (graphs with few edges) • Access Time • Is there an edge connecting vi to vj? • adjacency matrix – O(1) • adjacency list – O(d) • Visit all edges incident to vi • adjacency matrix – O(n) • adjacency list – O(d) • Primary operation of algorithm and density of graph determines more efficient data structure. • complete graphs should use adjacency matrix • traversals of sparse graphs should use adjacency list

Spanning Tree and Shortest Paths • Minimum-Cost Spanning Tree • assume weighted (undirected) connected graph • use Prim’s algorithm (a greedy algorithm) • from visited vertices, pick least-cost edge to an unvisited vertex • Shortest Paths • assume weighted (undirected) connected graph • use Dijkstra’s algorithm (a greedy algorithm) • build paths from unvisited vertex with least current cost

HASHING

Key Points • Hash tables • Hash functions • Collision resolution and clustering • Deletions

Indices vs. Keys • Each key/record is associated with an array slot. • We could map each key to each slot. • e.g. last name to apartment number • We could then search either the array (unsorted?) or a look-up table (sorted?) . • However, what if the look-up is actually a calculated function? • eliminate look-up!

Hash Functions • A hash function h() converts a key (integer, string, float, etc) into a table index. • Example

Hash Tables • Records are stored in slots specified by a hash function. • Look-up/store • Convert key into a table index with hash function h() • h(key) = index • Find record/empty slot starting at index = h(key)(use resolution policy if necessary)

Comments • Hash function should evenly distribute keys across table. • not easy given unspecified input data distribution • Hash table should be about half full. • note: time-space tradeoff • more space -> less time(and already twice as much space as a sorted array) • if half full, 50% chance of one collision • 25% chance of two collisions • etc... • 2 accesses on average(approaches n as table fills)

How to do better • What to do with collisions? • linear probing (“classic hashing”) • if collision, search spaces sequentially • To eliminate clustering, we would like each remaining slot to have equal probability. • Can’t use random – needs to be reproducable. • Pseudo-random probing (see text) • Goal of random probing? --> cause divergence • Probe sequences should not all follow same path.

Quadratic Probing • Simple divergence method • Linear probing – ith probe is i slots away • Quadratic probing

Secondary Clustering • If multiple keys are hashed to the same index/home position, quadratic probing still follows the same path each time. • This is secondary clustering • Use second hash function to determine probe sequence.

ITEC 2620M Introduction to Data Structures