1 / 17

ITEC 2620M Introduction to Data Structures

ITEC 2620M Introduction to Data Structures. Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/itec2620m.htm Office: DB 3049. Graphs. Key Points. Graph Algorithms Definitions, representations, analysis Shortest paths Minimum-cost spanning tree.

earlross
Download Presentation

ITEC 2620M Introduction to Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITEC 2620MIntroduction to Data Structures Instructor: Prof. Z. Yang Course Website: http://people.math.yorku.ca/~zyang/itec2620m.htm Office: DB 3049

  2. Graphs

  3. Key Points • Graph Algorithms • Definitions, representations, analysis • Shortest paths • Minimum-cost spanning tree

  4. Basic Definitions • A graph G = ( V, E ) consists of a set of vertices V and a set of edges E – each edge E connects a pair of vertices in V. • Graphs can be directed or undirected. • redraw above with arrows – first vertex is source • Graphs may be weighted. • redraw above with weights, combine definitions • A vertex vi is adjacent to another vertex vj if they are connected by an edge in E. These vertices are neighbors. • A path is a sequence of vertices in which each vertex is adjacent to its predecessor and successor. • The length of a path is the number of edges in it. • The cost of a path is the sum of edge weights in the path

  5. Basic Definitions (Cont’d) • A cycle is a path of length greater than one that begins and ends at the same vertex. • A simple cycle is a cycle of length greater than three that does not visit any vertex (except the start/finish) more than once. • Two vertices are connected if there as a path between them. • A subset of vertices S is a connected component of G if there is a path from each vertex vi to every other distinct vertex vj in S. • The degree of a vertex is the number of edges incident to it. – the number of vertices that it is connected to • A graph is acyclic if it has no cycles (e.g. a tree) . • A directed acyclic graph is called a DAG or digraph

  6. Representations • The adjacency matrix of graph G = ( V, E ) for vertices numbered 0 to n-1 is an n x n matrix M where M[i][j] is 1 if there is an edge from vi to vj, and 0 otherwise. • The adjacency list of graph G = ( V, E ) for vertices numbered 0 to n-1 consists of an array of n linked lists. The ith linked list includes the node j if there is an edge from vi to vj. • Example

  7. Comparisons and Analysis • Space • adjacency matrix uses O( ) space (constant) • adjacency list uses O(|V| + |E|) space (note: pointer overhead) • better for sparse graphs (graphs with few edges) • Access Time • Is there an edge connecting vi to vj? • adjacency matrix – O(1) • adjacency list – O(d) • Visit all edges incident to vi • adjacency matrix – O(n) • adjacency list – O(d) • Primary operation of algorithm and density of graph determines more efficient data structure. • complete graphs should use adjacency matrix • traversals of sparse graphs should use adjacency list

  8. Spanning Tree and Shortest Paths • Minimum-Cost Spanning Tree • assume weighted (undirected) connected graph • use Prim’s algorithm (a greedy algorithm) • from visited vertices, pick least-cost edge to an unvisited vertex • Shortest Paths • assume weighted (undirected) connected graph • use Dijkstra’s algorithm (a greedy algorithm) • build paths from unvisited vertex with least current cost

  9. HASHING

  10. Key Points • Hash tables • Hash functions • Collision resolution and clustering • Deletions

  11. Indices vs. Keys • Each key/record is associated with an array slot. • We could map each key to each slot. • e.g. last name to apartment number • We could then search either the array (unsorted?) or a look-up table (sorted?) . • However, what if the look-up is actually a calculated function? • eliminate look-up!

  12. Hash Functions • A hash function h() converts a key (integer, string, float, etc) into a table index. • Example

  13. Hash Tables • Records are stored in slots specified by a hash function. • Look-up/store • Convert key into a table index with hash function h() • h(key) = index • Find record/empty slot starting at index = h(key)(use resolution policy if necessary)

  14. Comments • Hash function should evenly distribute keys across table. • not easy given unspecified input data distribution • Hash table should be about half full. • note: time-space tradeoff • more space -> less time(and already twice as much space as a sorted array) • if half full, 50% chance of one collision • 25% chance of two collisions • etc... • 2 accesses on average(approaches n as table fills)

  15. How to do better • What to do with collisions? • linear probing (“classic hashing”) • if collision, search spaces sequentially • To eliminate clustering, we would like each remaining slot to have equal probability. • Can’t use random – needs to be reproducable. • Pseudo-random probing (see text) • Goal of random probing? --> cause divergence • Probe sequences should not all follow same path.

  16. Quadratic Probing • Simple divergence method • Linear probing – ith probe is i slots away • Quadratic probing

  17. Secondary Clustering • If multiple keys are hashed to the same index/home position, quadratic probing still follows the same path each time. • This is secondary clustering • Use second hash function to determine probe sequence.

More Related