250 likes | 549 Views
Towards a Quadratic Time Approximation of Graph Edit Distance. Fischer, A., Suen, C., Frinken, V., Riesen, K., Bunke, H. Contents Introduction Graph edit distance Hausdorff distance Approximating the ged with Hausdorff distance Application , experimental evaluation and results
E N D
Towards a Quadratic Time Approximation of Graph Edit Distance Fischer, A., Suen, C., Frinken, V., Riesen, K., Bunke, H. Contents Introduction Graph editdistance Hausdorffdistance ApproximatingthegedwithHausdorffdistance Application, experimental evaluationandresults Conclusions Fischer, A., Suen, C., Frinken, V., Riesen, K., Bunke, H.: A fast matching algorithm for graph-based handwriting recognition, submitted
Introduction • Graph editdistanceis a well establishedconcepttomeasurethedissimilarityofgraphs • Itcanbeusedfornearest-neighborclassification, clustering, andvariouskernelmethodsbased on dissimilaritiy • However in its original form, ist complexityisexponential • Therefore, variousapproximateprocedureshavebeenproposedfor ist computation; for a recentreviewsee X. Gao, B. Xiao, D. Tao, X. Li: A surveyofgrapheditdistance, Pattern Analysis & Applications 13, 113-119, 2010 • In thispresentationwedescribeworktowards a newapproximateprocedure, based on Hausdorfdistance, thatruns in quadratictime
Graph Edit Distance • Measuresthedistance (dissimilarity) ofgivengraphs g1and g2 • Is based in theideaofediting g1into g2 • Common editoperationsaredeletion, insertionandsubstitutionofnodesandedges • Can beusedwith a costfunction
ComputationalProcedure • Bunke, H., Allermann, G.: Inexactgraphmatchingforstructuralpatternrecognition, PRL 1, 245 – 253, 1983
Approximatingthe GED by an AssigmentProcedure • Givenaretwosets, X={x1,…,xn} and Y={y1,…,yn} togetherwith a costfunctioncij. • Wewantto find a one-to-onemappingthatminimizesthecostΣcif(i) • Problem was originallystudied in thecontextofOperations Research (assignmentofworkerstojobs) • Manyalgorithmsexist, typicallywith O(n3) complexity (Hungarian, Munkres, Volgenant/Jonker,…)
The assignmentproblemhasnothingto do withthegedproblem • However, gedcanbereformulated (simplified), such thatitcanbeapproximatelysolvedwith an assignmentprocedure • Different reformulationsarepossible (onlynodes, nodes plus edges) • The proceduresthatsolvetheassignmentproblemare optimal • Theyareonly suboptimal w.r.t. gedproblem, but theyrun in cubic time andgivegoodapproximationsofthetruedistance K. Riesen and H. Bunke. Approximategrapheditdistancecomputationbymeans ofbipartitegraphmatching. Image and Vision Computing, 27(7):950–959, 2009
HausdorffDistance (1) • A well-knowndistancemeasurebetweensetsofpoints in a metricspace • Oftenused in imageprocessingas a distancebetweensetsofpoints in the 2-D plane, or in 3-D space; see, forexample, Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparingimages using the Hausdorff distance, PAMI 15, 850–863, 1993
Given sets A and B, and a distance metric d(a,b) H(A,B)=max(maxa∊Aminb∊Bd(a,b),maxb∊Bmina∊Ad(a,b)) • Computationalcomplexityis O(nm), where |A|=n and |B|=m
HausdorffDistance (2) • Becauseofthemax-operator, H-distanceis sensitive tooutliers in thedata • Therearevariouspossibilitiestoovercomethisproblem: delete top-n, average, median,… • In thefollowing: replacemax-operatorbysummation (equivalenttoaveraging) • H’(A,B) = Σa∊Aminb∊Bd(a,b) + Σb∊Bmina∊Ad(a,b)
HausdorffDistance (2) • Becauseofthemax-operator, H-distanceis sensitive tooutliers in thedata • Therearevariouspossibilitiestoovercomethisproblem: delete top-n, average, median,… • In thefollowing: replacemax-operatorbysummation (equivalenttoaveraging) • H’(A,B) = Σa∊Aminb∊Bd(a,b) + Σb∊Bmina∊Ad(a,b)
Approximating Graph Edit Distancewith H-Distance • Sets A and B correspondtothesetsofnodesofgraphs g1and g2 • Distance d(a,b) betweena∊Aandb∊Bisgivenbynodesubstitutioncost • In thepresentcase, itistheEuclideandistanceofthenodeattributevectors (x,y)uand (x,y)vofnodes u∊g1and v∊g2: c(u,v)= ∥(x,y)u- (x,y)v∥ • Result: • h(g1,g2), original Hausdorffdistance, appliedtographs • h‘(g1,g2), max-operationreplacedbysummation • Possibleenhancement: includecostofeditoperations on theedgesadjacenttoconsidered pair ofnodes (similartoassignmentapproximation)
Additional Enhancement • h(g1,g2) and h‘(g1,g2) enforceall nodes in bothgraphsbeingmatchedwitheachother, i.e. thereareonlysubstitutions (possiblywith multiple assignments), but nodeletionsorinsertionsallowed • Measure h“(g1,g2) also allowsdeletionandinsertionofnodes • Itisidenticalto h‘(g1,g2), but usesthefollowingcostfunction: c(u,v)/2, if c(u,v)<c(u,Ɛ)c(u,Ɛ), otherwise • c“(u,v)=
Application, Experimental Evaluation andResults: Recognition ofHandwritten Historical Text
Conventional Features • Based on a sliding window, e.g. features by • Marti et al.: 9 features extracted from a window of 1 pixel width • Vinciarelli et al.: 16 windows of size 4 x 4 pixel; fraction of black pixels in each window; result: 16 features
Potential problem with conventional approach: • Two-dimensional shape of characters is not adequately modeled; no structural relations • Possible solution: • Use skeletons to represent the handwriting by a graph • Transform the graph of a handwritten text into a sequence of feature vectors • Apply HMMs or RNN to sequence of feature vectors
Graph Extraction • Apply a thinning operator to generate the skeleton of the image • Nodes: • Key points: crossings, junctions, end points, left-most points of circular arcs • Secondary points: equidistant points on the skeleton between key points; distance d is a parameter • Edges: • Nodes that are neighbors on the skeleton are connected by edges • However, in the experiments it turned out that the performance without edges is comparable to that with edges if parameter d is chosen appropriately; therefore, no edges were used
Experiments: Motivation andAim • Typicalgraphsizeisabout 30 nodes • The approximategedusing an assignmentalgorithmis still slow • Questionstobeanswered in theexperiments: • Howmuchspeed-up do wegainwiththe H-distancebasedapproach? • Howmuchrecognitionaccuracy do weloose?
Experimental Setup • Data: Parzival dataset http://www.iam.unibe.ch/fki/databases/iam-historical-document-database • 13th centurymanuscriptwritten in Old German • Segmentedinto individual words • 11,743 wordinstances (images) • 3,177 wordclasses • 79 characterprototypes • Distancemeasures h, h‘, and h“ werenormalized • Division ofthedatabaseintotraining, validation, andtestsets
Experimental Results • Word recognition rate on testset • h, h‘, h“ asintroducedbefore; s based on assignmentproc. • Computationalspeed (Java implementation) • Median graphsize: 30 nodes • Median # ofgraphmatchings per word: 6162 • Run time in seconds
Conclusions • Gedis a powerful concept but is, in its original form, tooslowformostapplications • Variousfasterapproximationsofgedhavebeenproposed • In this talk, a newapproximateversionwithquadraticcomplexityisproposed, based on Hausdorffdistance • It was practicallyevaluated in thecontextof a handwritingrecognitiontaskandhasshowngoodresults • Nevertheless, moreexperimentsareneeded, especiallywithothergraphdatasets (otherattributes), and larger graphs; itwouldbeinterestingtocomparethenewdistances „moredirectly“ withdistancesobtainedfromotherapaproximatemethods
Acknowledments • HISDOC consortium: R. Ingold, J. Savoy, M. Bächler, N. Naji (collaborators in historicalhandwritingrecognitionproject) • SNF (financialsupportfor HISDOC) • SNF (postdocstipendfor AF)