720 likes | 805 Views
91.102 - Computing II. Graphs . These are generalizations of trees: They have vertices: V And edges: E But they don’t have a “distinguished vertex” called the “root”. Root of Tree. Graph. Graph. 91.102 - Computing II.
E N D
91.102 - Computing II Graphs. These are generalizations of trees: They have vertices: V And edges: E But they don’t have a “distinguished vertex” called the “root”. Root of Tree Graph Graph
91.102 - Computing II Vertices: these are just nodes (structures) where some information may be kept. They are usually objects all of the same type. Edges: these are “pairs of vertices”, unordered ({vi, vj}, i ≠ j, {vi, vj}={vj, vi}, ) in the case of an undirected graph; ordered ((vi, vj), i ≠ j, (vi, vj)≠(vj, vi)) in the case of a directed graph.
91.102 - Computing II What kind of operations must we specify? 1) Create an empty graph. 2) Check if a graph is empty. 3) Insert a vertex. 4) Insert an edge between two existing vertices. 5) Delete an edge, if it exists. 6) Delete a vertex and all the edges (pairs) it belongs to.
91.102 - Computing II With trees, the only place we stored information was in the tree nodes: the edges contained only the parent-child relationship and nothing else. With graphs, we choose to store information BOTH in the vertices AND in the edges. Example: you own the garbage collection company that won the contract for collecting the garbage in town Xyz. How do you decide on the route that will be the least expensive (thus providing you with the most profit), while collecting garbage 5 days a week, with a given minimum collected each day? First of all, how do we represent the town so that we can even ask the question in a way that it might be answerable?
91.102 - Computing II One possibility: Every street intersection and every end of cul-de-sac are represented by a VERTEX; every section of street between two intersection is represented by an EDGE. Single arrows = one-way streets
91.102 - Computing II What’s the relevant information? A) The length of each street section. B) The number (and type) of houses on the section (to estimate the amount of garbage generated). Do the vertices matter - or: should they contain some information? Probably not… at least not for now, at this level of representation.
91.102 - Computing II 500ft 1050ft 650ft 600ft 450ft Problem: find a route which will cover every street of the town at least once and which covers the shortest total distance. You would usually have a particular location (maybe more than one) from which your “tour” must start - the street (or streets) from which your trucks will enter and exit the town.
91.102 - Computing II What’s at stake? The contractor with the most accurate estimate of costs can come up with the best bid likely to make money… at least in a perfect world… Other uses: networks of tasks, task precedence, time-to-completion, etc. This would be very useful for production schedules and for identifying which tasks are on “critical paths” - the tasks for which any delay will delay the delivery of the final product. Etc. We will see more applications as we proceed.
91.102 - Computing II Thought: what does all this imply about the other functions that will make up the Graph ADT? There may be quite a few… and they may be fairly complicated, although they will be directed towards answering questions about graphs, rather than just “graph maintenance”.
91.102 - Computing II Some definitions: G = (V, E). A graph G is an ordered pair of sets: a set V of vertices and a set E of edges. E is a set of pairs of vertices, ordered pairs in the case of a directed graph, unordered pairs otherwise. Two vertices vi and vj, i ≠ j, in a graph G = (V, E) are adjacent if there exists an edge e G, such that e = (vi, vj) or e = (vj, vi) (or {vi, vj} in an undirected graph). A path p in a graph G = (V, E) is a sequence v1v2…vn of vertices from V, with n ≥ 2, and such that each vertex vi is adjacent to the next vertex vi+1 of the sequence. A cycle is a path p = v1v2…vn such that v1 = vn.
91.102 - Computing II Cycle 1 2 Path 1 2 3 3 4 4 Non-simple cycle: 1,2,1 3 1 Simple Cycle: 1,2,3,1 2 Cycle 1 2 1 2
91.102 - Computing II Connectivity: in a tree you can reach any node as long as you start from the root, otherwise you can reach only descendants. In a graph there is no such distinguished vertex, and it is quite possible that no path connects two or more vertices - in a tree these would be vertices belonging to different subtrees of some vertex, but they would be all reachable from the common ancestor. A question that did NOT arise for trees arises for graphs: what do we mean by connectivity and how do we determine if a given graph satisfies a given definition of “connected”? The question has different answers depending on whether the graph is undirected or directed.
91.102 - Computing II Undirected Graph: a graph G =(V, E) is connected if, given any two vertices v and w in V there exists a path of edges in E starting in v and ending in w: v.a1.a2.a3…an.w. V can be decomposed into subsets called its connected components. Some will be maximal and will be disjoint. 1 2 7 3 4 8 9 5 6 Some Connected Components: {3}, {1, 2}, {4, 5, 6}, {7, 8, 9},... Some Maximal Ones: {3}, {1, 2, 4, 5, 6}.
91.102 - Computing II For a Directed Graph, it is possible to have a path from v to w, but without a return path from w to v. Strongly Connected Component: both direct and return paths exist for any pair of vertices in the component. Weakly Connected Component: only one path exists between any pair of vertices in the component. Strongly Connected Digraph Weakly Connected Digraph
91.102 - Computing II Adjacency Set of a vertex: all those vertices that can be reached from it through paths of length 1 - or, by just following ONE edge. In-degree of a vertex: the number of edges ending at the vertex; Out-degree of a vertex: the number of edges starting at the vertex; Degree of a vertex: this concept is more useful for an undirected graph and is the number of edges containing (as two-element sets) the current vertex. For a directed graph this is the sum of the in-degree and the out-degree.
91.102 - Computing II How do we represent graphs? By now it should be clear that this is not really a good question. The question should be: which graph representation, among the several we will be able to come up with, is most suitable for a particular set of operations? First of all, given a graph with n vertices, how many edges can it have?
91.102 - Computing II Every vertex could have (n - 1) edges, connecting it to each of the remaining (n - 1) vertices. The total, then (for n vertices), is n*(n - 1) edges for a directed graph - one edge to and another from - and n*(n - 1)/2 edges for an undirected graph, since a single edge will serve for both directions. An undirected graph with 100 vertices could have as many as 4950 edges, a directed one, 9900. 6*5/2 = 15
91.102 - Computing II On the other hand, the graph could have NO edges at all - or very few. It is quite likely that different representations will have optimal behavior at different sizes of the set of edges. We first look at a representation that is quite space efficient in case of a graph with many edges. This is the adjacency matrix representation. It simply consists of a square array, n*n in size, with a 0 in the (i, j)-location corresponding to a missing edge from vertex vi to vertex vj and a 1 in the (i, j)-location corresponding to an existing edge from vertex vi to vertex vj
91.102 - Computing II A directed Graph and its Adjacency Matrix 2 1 0 1 2 3 4 5 6 7 8 9 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 2 0 0 0 1 0 1 0 0 0 0 3 5 3 0 0 0 0 0 0 1 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 5 0 0 0 0 1 0 0 0 1 0 4 8 6 1 0 0 0 1 0 0 0 1 0 7 0 0 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 1 6 9 0 0 0 0 0 0 1 0 0 0 7 13% of the entries are used to indicate an edge. 9
91.102 - Computing II This can be made fairly “space efficient” if we observe that a single bit will permit us to represent the values 0 and 1: each row of the adjacency matrix is an array of n bits. Total space: n2 bits. Unfortunately, efficient manipulation of such bit-oriented structures depends on language and hardware features. C, being originally a fairly low level systems programming language, does possess “bit manipulation primitives” that map to the underlying machine language bit manipulation instructions. Without such functions, the manipulation of bit quantities could be rather inefficient.
91.102 - Computing II For example, reading a bit from memory is just as expensive as reading a whole word, plus the fact that you can’t just check whether the word holding the bit is 0 or not: you have to extract the right bit and then check if that is 0 or not. This can be done by "and-ing" with the correct constant word (the one with all zeros except for the bit you want to test) and THEN checking if the result is 0. The usual trade of time for space...
91.102 - Computing II Let’s examine, for a moment, the code provided by the text. #define ByteSize 8 #define SetSize 6*ByteSize // 48 bits:max # of vertices typedef char Set[SetSize/ByteSize]; // a bit-vector // pack them into bytes - notice this is // NOT an integer number of 32-bit words! // AND we are using the C "confusion" between characters // and "8-bit" patterns... Set AdjacencyMatrix[SetSize]; // 48 48-bit vectors
91.102 - Computing II To Check for the existence of an edge: bool IsEdge(int from, int to, Set adj[SetSize]) { // extract the row (bitvector) - check the column return(Member(to, adj[from])); } bool Member(int i, Set S) { return((S[i/ByteSize] >> (i % ByteSize)) & 1); }
91.102 - Computing II Notice that S[i/ByteSize] Gives the byte in the array where the bit is located: i/ByteSize is an integer division, returning an integer. (i % ByteSize) Gives the bit position in the byte: (i % ByteSize) is the remainder of the previous division.
91.102 - Computing II >> Is the “rotate right” instruction where the left argument is “what” gets rotated and the right argument is the “by how much”. Ex:01010110 >> 3 = 00001010 & Is the “bitwise-and” operator. 1 lets us verify that the rightmost bit is now 1 (or not).
91.102 - Computing II One of the things you may note is that the indices of the vertices do not correspond to positions in the bit array: each 8 bits obey the rule that the lowest bit is in the rightmost position: the pattern 10000000 corresponds to the current vertex having a path connecting it to the seventh (start from 0) vertex of the vertex group corresponding to that character - rather than the zeroth. Furthermore C, in its wisdom, lets you input integers in DECIMAL, OCTAL or HEXADECIMAL format, but not in binary… The representation may be compact, but it will require a certain amount of mental gymnastics to set up and use correctly.
91.102 - Computing II The representation chosen for the Programming Problems is much less space-efficient, but allows for easy access to more complex information. typedef struct vertex{ int Number; bool Visited; } Vertex; typedef struct edge{ Vertex From; Vertex To; } Edge; typedef struct graph{ int N_Vertices; /* maximum of 20 */ int N_Edges; /* maximum of 30 */ Vertex Vertices[20]; Edge Edges[30]; } Graph;
91.102 - Computing II From this we can create the Adjacency Matrix: void Create_Adjacency_Matrix(Graph graph, int adj[20][20]) { int i,j; /* initialize all adjacency matrix elements to 0 */ for(i=0; i < graph.N_Vertices; i++){ for(j=0; j < graph.N_Vertices; j++){ adj[i][j] = 0; } } /* set vertices joined by edges to 1 */ for(i = 0; i < graph.N_Edges; i++){ adj[graph.Edges[i].From.Number-1] [graph.Edges[i].To.Number-1] = 1; } }
91.102 - Computing II For this representation: bool IsEdge(int from, int to, int adj[20][20]) { return(adj[from][to]==1); } No “extract the correct byte”, no “rotate-right”, no “bitwise-and”; ...more space… 1 bit --> 1 word? If the graph has many vertices and few edges, this (i.e., the adjacency matrix) could be an extremely wasteful representation.
91.102 - Computing II Matrix Multiplication and Transitive Closure. One of the questions that can be asked has to do with connectivity: does there exist a path from vertex a to vertex b? Can we find the answer to this question quickly? The answers can be obtained via matrix multiplication, where a, b and c are n by n matrices: for(i = 0,1,..,n-1){ for(j = 0,1,..,n-1){ c(i,j) = 0; for(k = 0,1,..,n-1){ c(i,j) = c(i,j) + a(i,k)*b(k,j); } } }
91.102 - Computing II If A is an adjacency matrix, the product A * A will give the paths of length 2: 1 0 3 2
91.102 - Computing II The transitive closure matrix: for(i = 0,1,..,V-1){ for(j = 0,1,..,V-1){ if(adj(j,i) == 1){ for(k = 0,1,..,V-1){ if(adj(i,k) == 1) adj(j,k) = 1; } } } } 1 0 3 2
91.102 - Computing II Some other representations: Vertex Number (Out) Degree 2 Adjacency List 1 3 1 2 2 3 2 3 3 4 5 3 1 4 4 0 5 4 5 1 1 A possible problem here is the actual representation of the adjacency lists: a fixed length array for each may not be very good, especially if ONE vertex has (n - 1) outgoing edges...
91.102 - Computing II 1 2 3 4 5 2 3 3 4 5 2 4 1 3 1 Each piece of information takes more space, but there is no wasted space. This representation works well for “sparse graphs”, where the number actual of edges is “small”. It accommodates well large variations in outdegree. 5 4
91.102 - Computing II Graph Searching: all the representations have one goal - to make searching for information stored in a graph feasible (easy?…) Graphs (unlike trees) can have cycles… this means that simply chasing pointers until a NULL is found (usual termination condition for a tree) will get us into trouble. We get OUT of trouble by adding some information to each node: the Visited boolean field. This provides us with some “memory” of past actions. The termination condition will become: There is no way out of here (NULL pointer out) Or We have been here before (Visited == true).
91.102 - Computing II Before starting a GraphSearch: Make sure all Visited fields are set to false. There are two general search strategies: 1) Depth-first: try to reach the “end” of the graph by following “descendant pointers” and then worry about spreading your search out. 2) Breadth-first: visit the graph “in layers”. First the start node, then all its immediate descendants, then all their immediate descendants, etc. You will reach the “fringe”, or graph boundary, later.
91.102 - Computing II The two types of search are used for different applications: if you are looking for ONE possible solution to a problem, and don’t care about any others, then depth-first is the preferred search; if you want ALL solutions (or the one "nearest" the start node) then breadth-first will work better. They can both be implemented via the "pseudo-code" in the next slide: the Container is a STACK for depth-first-search, and a QUEUE for breadth-first-search.
91.102 - Computing II void GraphSearch(G,v) // Search graph G from vertex v { (Let G = (V,E) be a graph.) (Let C be an empty container.) for (each vertex x in V) x.Visited = false; // each vertex x marked unvisited // Use vertex v in V as a starting point, put v in C (Put v into C); while (C is non-empty) { (Remove a vertex x from container C); if (!(x.Visited)) { // if vertex not visited already Visit(x); // visit x, and then x.Visited = true; // mark x as having been visited for (each vertex w in Vx) {// Vx = children of x if (!(w.Visited)) // put all unvisited (Put w into C); //vertices of Vx into C } } } }
91.102 - Computing II Depth-First Search. Starting with Node 1; pushing counterclockwise from Noon 8 4 7 1 3 6 2 5 1 t 4 t 8 t 5 t 7 t 6 t 3 t 2 t Stack 8 5 6 4 7 7 7 3 3 3 3 3 3 3 3 3 1 2 2 2 2 2 2 2 2
91.102 - Computing II Breadth-First Search. Queue 8 1 4 1 t 2 3 4 7 2 t 3 4 5 6 1 3 3 t 4 5 6 4 t 7 8 5 6 6 5 t 7 8 6 2 6 t 7 8 5 7 t 8 8 t Starting with Node 1
91.102 - Computing II A slightly different Depth-First-Search: using the activation stack rather than “rolling your own”... void GraphSearch(G,v) // Search graph G from vertex v { (Let G = (V,E) be a graph.) for (each vertex x in V) x.Visited = false; // each x in V marked unvisited AuxGraphSearch(G, v); } void AuxGraphSearch(G, v) { if (v != NULL) { v.Visited = true; // mark it visited for (each vertex w in Vv) {// descendants of V if (!(w.Visited)) AuxGraphSearch(G, w); } } }
91.102 - Computing II Topological Ordering: Graphs are usually represented using two (or three) dimensions (if you want to avoid having edges cross one another). Question: are there conditions under which we could represent a graph with its nodes in a LINEAR sequence, where the sequence order corresponds to some desired relationship among the graph nodes? (some edges will have to "move out of the line"… so we are NOT talking about changing the GRAPH into a LIST: the list will be a new structure, which will just use the graph vertices)
91.102 - Computing II An application: course prerequisites. Let each course be represented by a vertex, let the “immediate prerequisite” relation be a directed edge between two vertices. 91.101 is a prerequisite for 91.102, etc... 91.101 91.102 91.201 91.204 91.304 91.301 91.404 91.203 16.265 91.305 91.308 92.131 92.132 92.231 92.386 92.321 92.322 95.141 95.144
91.102 - Computing II This graph is not easily graspable. What we will try to do is make a LIST out of this spaghetti bowl in such a way that the “prerequisite relation” is satisfied going from left to right: each course which precedes another - and has a link to it - is a prerequisite. These graphs have one property: they have NO cycles - unless somebody has made a mistake and written down, for example, that CS1 is a prerequisite for CS2 and CS2 is a prerequisite for CS1. This would lead to an interesting, but somewhat unsatisfactory curriculum. Let’s collect some observations.
91.102 - Computing II 1) Any vertex with in-degree = 0 denotes a course with no prerequisites (for simplicity, we assume some co-requisites - Calculus I, for example - to be pre-requisites - for Physics I, for example). We will have to keep track of such vertices, lest we “forget” some “basic prerequisites”. Let D[v] denote the number of predecessor vertices (the in-degree) of vertex v in the graph G. Let L be the list of those vertices w with no predecessors: D[w] = 0. As we construct the promised list, D[v] will keep track of the number of predecessor vertices of v NOT YET ON L.
91.102 - Computing II 2) In a directed acyclic graph there is at least one vertex with in-degree = 0, so the initial L is NOT empty. Why? This needs proof, but is not hard. The proof will be by contradiction: assume the conclusion (in-degree = 0 for at least one vertex) is false, and prove that this lets you construct a cycle - but the original graph was supposed to be acyclic - contradiction. Proof: Assume all vertices have positive in-degree. Pick one and back yourself to a predecessor. Since the graph is finite (there is a finite number of vertices), you can back yourself out of vertices into their predecessors only a finite number of times before you run into a vertex you already backed yourself out of - and you have found your cycle...
91.102 - Computing II 3) Start with an empty list L and a queue Q containing those vertices v of the graph with D[v] = 0 - their order in the queue doesn’t matter much. 4) Dequeue an element v from Q and add it to end of the list L. For every w successors(v), decrement D[w]. If D[w] = 0, enqueue w into Q. 5) Keep on going with 4) until Q is empty. 6) The list L is the “topological sort” of the graph G.
91.102 - Computing II An Example: Initial Configuration D[v] 2 4 Q 1 0 1 6 1 2 3 3 3 1 L 5 4 2 6 5 4 6 0 First Pass: Second Pass: Q Q 1 0 6 3 1 0 3 2 2 2 1 3 0 3 0 L L 4 1 1 4 1 1 6 5 3 5 2 6 0 6 0
91.102 - Computing II Third Pass: Fourth Pass: Q Q 1 0 2 1 0 4 2 0 2 0 3 0 3 0 L L 4 1 1 6 3 4 0 1 6 3 2 5 1 5 1 6 0 6 0 Fifth Pass: Sixth Pass: Q Q 1 0 5 1 0 2 0 2 0 3 0 3 0 L L 4 0 1 6 3 2 4 4 0 1 6 3 2 4 5 5 0 5 0 6 0 6 0
91.102 - Computing II Shortest Paths. Let’s say that the edges of the graph contain information: the distance (in hours, miles, gallons of fuel, stops for a quick hamburger, highway tolls, or whatever) between the two vertices connected by the edge. Let’s say that you have a road map with this information encoded. A question you may want to ask is: What’s the shortest route from town A to town B?