180 likes | 293 Views
A * Search. A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out of one or more possible goals. Definitions.
E N D
A* Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out of one or more possible goals.
Definitions • A* uses a distance-plus-estimate heuristic function denoted by f(x) to determine the order in which the search visits nodes in the tree induced by the search. The distance-plus-estimate heuristic is a sum of two functions: • the path-cost function denoted g(x) from the start node to the current node and • an admissible "heuristic estimate" of the distance to the goal denoted h(x). • an admissible h(x) must not overestimate the distance to the goal. For an application like routing, h(x) might represent the straight-line distance to the goal, since that is physically the smallest possible distance between any two points (or nodes for that matter).
An A* algorithm for Edit Distance Edit Distance DE (X,Y) measures how close string X is to string Y. DE(X,Y) is the cost of the minimum cost transformation t : X t Y where t is a sequence of operations (insertion, equal substitution, unequal substitution, and deletion). The cost of t is the sum of the operation costs where each operation costs 1 except for equal substitution which costs 0. The cost of this transformation is 3 which happens to be minimal.
Dynamic programming Solution (an O(mn) solution) Decomposition : Last Operation Delete, Substitute, or Insert Atomic Problems : X prefix or Y prefix empty Table : Rows for 0 .. M for X prefix characters, Columns 0 .. N for Y prefix characters Table Entry : DE (Xi , Yj) Composition : = cost(Substitution) = 1 if xi != yj and 0 otherwise. DE (Xi ,Yj ) = min{ DE (Xi-1 ,Yj ) + 1, DE (Xl-1 ,Yj-1 ) + , DE (Xi ,Yj-1 ) + 1 }
Edit Distance as a Shortest Path Problem • Define a transformation graph GXY = (V,E) as follows: • The set V of nodes (vertices) = {0 .. M} {0 .. N} where node npq represents the state of transforming a p length prefix of X into a q length prefix of Y. • The set E of edges represent the operations of • deletion , connecting node np,q to np+1,q with length 1 • substitution , connecting node np,q to np+1,q+1 with length 0 or 1 depending on whether Xp+1 = Yq+1 or not • insertion , connecting node np,q to np,q+1 with length 1 • The start and goal nodes are n0,0 and nM,N
Introduction Edit Distance – Based on Single Character Edit Operations • Insertion : a • Inserts an “a” into target without effecting the source; cost = 1 • Equal Substitution : a a • Substitutes an “a” into target for an “a” in source; cost = 0 • Unequal Substitution : a b • Substitutes a “b” into target for an “a” in source; cost = 1 • Deletion : a • Deletes an “a” from source without effecting the target; cost = 1
Example of a Transformation Graph The vertices of T correspond to prefix pairs of X and Y. The edges of T are directed and correspond to the single character edit operations which would transform one prefix pair into another. • Example of a Transformation Graph • X = abbab • Y = bbaba
DE(X,Y) = cost of shortest pathstart vertex to goal vertex = 2
A frequency based Lower Bound function h • Let Xi be the suffix of X beginning with the ith character and Yj be similarly defined. • If X = abbab and Y = bbaba • X2 = bbab and Y2 = baba • Excess(X2,Y2,a) = 0 • Def(X2,Y2,a) =1 • Excess(X2,Y2,b) = 0 • Def(X2,Y2,b) =0 • Excess(X2,Y2) is sum of excesses over alphabet and Def(X2,Y2) is sum of deficiencies. • h( X2,Y2 ) = max{Excess(X2,Y2),Def(X2,Y2)} is a lower bound to the length of the shortest path from vertex to goal.
Applications of Edit Distance • DNA analysis • Classification of heart beats. • Handwriting recognition. • Spelling correction. • Error correction of variable length codes. • Speech recognition.
Classification as Path Problem • LB(Start,Goal-1) = 0 • LB(Start,Goal-2) = 3
Lower Bounds to Edit Distance • Lower Bound Based on Frequency • Let fa(X) and fa(Y) be the frequencies of a in X and Y. • Define Ex(a,X,Y) = fa(X) – fa(Y) if fa(X) > fa(Y) else 0 • Define Def(a,X,Y) = fa(Y) – fa(X) if fa(Y) > fa(X) else 0 • For any a, both Ex(a,X,Y) and Def(a,X,Y) D(X,Y) • Ex(a,X,Y) + Ex(b,X,Y) D(X,Y). • max { a Ex(a,X,Y), a Def(a,X,Y) } D(X,Y) • LB(i,j,X,Y) computed for the ith suffix of X and the jth suffix of Y is a lower bound to the remaining distance after having computed the edit distance for the ith and jth prefixes of X and Y.
Lower Bounds to Edit Distance Lower Bound Based on Frequency • Since X has a deficiency of 1 b with Y1 as a target, 1 is a lower bound to D(X,Y1). • Since X has a deficiency of 2 a’s with Y2 as a target and an excess of 1 b, 2 is a lower bound to D(X,Y2). • Since X has a deficiency of 3 b’s with Y3 as a target and an excess of 2 a’s, 3 is a lower bound to D(X,Y3). • Consequently the initial vertices of the 3 transformation graphs are organized into a priority queue as shown to the left.
A* Search for Closest Target f = h + g Keeping track of last operation since insertion cannot be followed by deletion and vise versa
A* Search for Closest Target • Finds distance of 1 to Y1 in 3 steps. • Y1 must be a closest goal since bnd + dist is minimized.