250 likes | 340 Views
Problem solving with graph search. Graphs. G = (V, E) V is a set of vertices, {v 1 , …, v N } E is a set of edges between vertices, {e 1 , …, e M } Each edge e i is an ordered pair of vertices in V: e i = ( v j , v k ) for some j and k between 1 and N.
E N D
Graphs G = (V, E) V is a set of vertices, {v1, …, vN} E is a set of edges between vertices, {e1, …, eM} Each edge ei is an ordered pair of vertices in V: ei = (vj, vk) for some j and k between 1 and N. Sometimes, edges will also have weights, or real numbers, associated with them: weight(ei) = wi. A path in a graph is a sequence of edges (e1, …, ek) such that for every pair of edges (ei, ei+1) in the path, the second vertex of ei is the first vertex of ei+1. A (directed) graph. A path with 7 edges (8 vertices) is highlighted in red.
Graphs as AI Representations Graphs are often used in AI to represent a set of entities, and some relationship between them. Entities are represented as vertices. An edge between vi and vj represents the fact that there is a relationship between vi and vj. For example, we could use a graph to represent: - a map of cities and roads (how?) - a computer network (how?) - a social network (how?)
Graph problems • Is there a path connecting vi and vj? • What path between vi and vj has the fewest “hops” (number of intervening vertices)? • What path between vi and vj has the shortest length (sum of weights on the edges in the path)?
Quiz Suppose I want a program that can find the route from Incheon Airport to Seodaemun station, and has the fewest number of stops along the way. 1. Formulate this as a graph search problem. 2. Using your knowledge of graph algorithms from prior classes, what algorithm(s) can solve this problem for me?
Quiz Now suppose I want a program that can tell me the route between Incheon Airport and Seodaemun station that takes the shortest amount of time. (Assume the avg. time between connected stations is fixed and known.) • Formulate this as a graph search problem. • Name an algorithm or algorithms (from prior classes) that can solve this problem.
Recall: Breadth-first search Breadth-first search(Graph G, start vertex v, goal vertex g): create a queue Q, marked-set M enqueuev onto Q add vto M Parent(v) Nil whileQ is not empty: t← Q.dequeue() ift= g: return CreatePath(t) for all edges e in G.adjacentEdges(t) do u← G.adjacentVertex(t,e) ifu is not in M: Parent(u) t add uto M enqueueu onto Q return fail
Uniform-cost Search (Dijkstra’s algorithm, but with a single goal node) Uniform-cost search(Graph G, start vertex v, goal vertex g): create a priorityqueueQ (priorities given by Cost) create an empty set of explored (or “marked”) vertices M Cost(v) 0, for all other v’, Cost(v’) infinity enqueuev onto Q Parent(v) Nil // Note: v is not marked during the initialization phase whileQ is not empty: t← Q.dequeue() ift= g: return CreatePath(t) add t to M for all edges e in G.adjacentEdges(t) do u← G.adjacentVertex(t,e) ifu is not in M: if Cost(t)+Cost(e) < Cost(u): Cost(u) Cost(t)+Cost(e) Parent(u) t enqueue (or re-queue) uonto Q return fail
Quiz Let’s say I wanted to find the route from Incheon Airport to Seodaemun that involved the fewest transfers. Formulate this as a graph search problem that can be solved with UCS.
UCS vs. Breadth-first • UCS guarantees that (if all edge weights are non-negative) the shortest path to a particular node is found • Breadth-first search guarantees that a path with the fewest number of hops is found. • Breadth-first search is a special case of uniform-cost search when all edge costs are positive and identical
Heuristics A heuristic is a nice name for a “hack”. More technically – a heuristic is a technique designed for solving a problem approximately, but quickly. For example, figuring out the exact distance between two vertices in a graph might require running a graph search algorithm. This gives an exact, but perhaps slow, solution. If geometric coordinates of the two vertices are known, then a heuristic solution could be to compute the Euclidean distance between the two vertices. This is usually much faster, but it could under-estimate the true distance in the graph.
Best-first (or Greedy, or Heuristic) Search Best-first search(Graph G, start vertex v, goal vertex g, heuristic function h): create a priority queue Q (priorities given by Cost) create an empty set of explored (or “marked”) vertices M Cost(v) h(v) enqueuev onto Q Parent(v) Nil whileQ is not empty: t← Q.dequeue() ift= g: return CreatePath(t) add t to M for all edges e in G.adjacentEdges(t) do u← G.adjacentVertex(t,e) ifu is not in M: Cost(u) h(u) Parent(u) t enqueueuonto Q return fail
A* Search A* search(Graph G, start vertex v, goal vertex g, heuristic function h): create a priority queue Q (priorities given by Cost) create an empty set of explored (or “marked”) vertices M Cost(v) h(v), for all other v’, Cost(v’) infinity enqueuev onto Q Parent(v) Nil whileQ is not empty: t← Q.dequeue() ift= g: return CreatePath(t) add t to M for all edges e in G.adjacentEdges(t) do u← G.adjacentVertex(t,e) ifu is not in M: temp Cost(t) + Cost(e) + h(u) – h(t) if temp < Cost(u): Cost(u) temp Parent(u) t enqueueuonto Q return fail
Comparison of Algorithms Completeness: If a solution (path from start to goal) exists, will the algorithm find it? A*, breadth-first, and uniform-cost are complete (under certain assumptions about the minimum weight of an edge). Best-first and depth-first are NOT complete. They may get side-tracked by an infinitely-long chain of nodes that never reach the goal, even if a different chain of nodes does reach the goal.
Comparison of Algorithms Optimality: If the algorithm returns a path, does that path have the lowest cost of all possible paths? Uniform-cost is optimal for finding lowest-cost (for any definition of cost where edge weights are non-negative) paths. A* is also optimal (for non-negative edge weights), if the heuristic is admissible and consistent. (Will define these soon.) Breadth-first is optimal for finding shortest-hop paths, but not for any other cost function. Best-first is NOT guaranteed to be optimal.
Comparison of Algorithms Time complexity: How many states will the search explore? In the worst case, A* and uniform-cost will explore a number of nodes that is exponential in the length of the shortest path.
Comparison of Algorithms A* is optimal in another sense: for a given heuristic, out of all complete search algorithms that use that heuristic, it will explore the fewest number of nodes. This also assumes that the heuristic is admissible.
Comparison of Algorithms A* is a generalization of Uniform-cost/Dijkstra’s. If the heuristic is 0 for all nodes (this is an admissible heuristic), then A* is the same as Uniform-cost. A* provides added flexibility: it can incorporate additional information into the search, in the form of the heuristic function.
Admissible Heuristic A heuristic function is admissible if it never over-estimates the true cost of getting from the current node to the goal. For example, if the cost function is the overall distance traveled, then the Euclidean distance between the current state and the goal state is admissible (assuming a flat surface), since it’s the distance of a straight line between the current state and goal state, which is the shortest possible distance between two points. If the cost function is the time traveled, and we use a heuristic of Euclidean distance / 55 mph, then this heuristic is not necessarily admissible (even if it’s a good approximation). There may be a straight road with a 65mph speed limit between the current point and the goal node, in which case the heuristic would over-estimate the time it takes to get to the goal.
Quiz Let’s say we’re doing A* search on a computer network, to try to find the best route to send a packet from computer B to computer C. The cost function is the time it takes to get the packet from B to C. • Is a heuristic of [the distance between current location and C, divided by the speed of light] an admissible heuristic? • What about a heuristic that stores, on each router in the network, the average time it takes to get a packet to every other router, based on previous packets. The heuristic uses the average time to get from the current router to computer C, as stored on the current router, as the heuristic time?