330 likes | 692 Views
Optical Character Recognition: Using the Ullman Algorithm for Graphical Matching Iddo Aviram. OCR- a Brief Review.
E N D
Optical Character Recognition:Using the Ullman Algorithm for Graphical MatchingIddoAviram
OCR- a Brief Review • Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. • OCR is a task, and not amathematically defined problem.
OCR- a Brief Review • People are using many disciplines for OCR. • We will show just a simple, not representative, approach to deal partly with the OCR task. Decision Making Fourier Transforms Expert Systems Topology Machine Learning Pattern Matching Neural Networks Optimization Problems Differential Geometry Computer Vision
OCR- a Brief Review • The task can be very hard, and state-of-the-art algorithms might be not good enough for some practical purposes. In several cases, however, OCR tools can perform well and be useful.
OCR- a Brief Review • The human brain does amazingly well with OCR tasks, so usually the computer results are evaluated by a comparison with a manually created ground truth data. • However, sometimes even humans are not capable of recognition.
OCR- a Brief Review • Can you read these scripts? נדלן בפתח תקווה: (למעלה: מתוך yad1.co.il, 2012 למטה: מתוך ה"חבצלת" 1912)
OCR- a Brief Review • Can you read this script? גרסה מוקדמת ל"שיר כאב" + "שיר מים רבים", מאיר אריאל, סוף שנות ה-70
OCR- a Brief Review • Can you read this script? כתובת על חרס (אוסטרקון) -חורבת עוזה תקופת הברזל II, המאה ה-7 לפני הספירה דיו על חרס רשות העתיקות “אֹמֶר למלך אֱמֹר לְבִלְבֵּל: הֲשָלֹם אַתָּ? והִבְרַכְתִּךָ לְקוֹס. וְעַתָּ תֵּן אֶת הָאֹכֶל אֲשֶר עִמַּד אֲחִאִמֹּה [ ] וְהֵרִם ע[ז]אל עַל מִזְ[בַּח קוֹס פֶּן יֶ]חְמַר הָאֹכֶל.”
OCR- Motivation for Graphical Matching • Using graphical tools for object recognition. • A possible scheme: • Binarization • Segmentation by connected components • Thinning • Graphical modeling • Graphical matching • Rule-Based Selection
OCR- Motivation for Graphical Matching • Binarization:
OCR- Motivation for Graphical Matching • Segmentation-> Thinning-> Graphical modeling:
OCR- Motivation for Graphical Matching • Given an historical manuscript, a blessing of Brit Milah:
OCR- Motivation for Graphical Matching • We’re interested in finding the occurrences of the letter Mem (not final):
OCR- Motivation for Graphical Matching • By sub-graph matching we can find candidates: Graphical matching Graphical modeling
Subgraph Isomorphism Problem • Given two graphs H and G as input, the problem is whether H has a subgraph that is isomorphic to G. • In this example the answer is ‘yes’ since there’s an isomorphic correspondence: 1G-1H,2G-3H,3G-2H. (There are additional isomorphic correspondences).
Subgraph Isomorphism Problem • Graph isomorphism • Graphs G(VG,EG) and H(VH,EH) are isomorphic if |VG|=|VH| and there is an invertible function F from VG to VH such that for all nodes u and v in VG, (u,v)∈EG if and only if (F(u),F(v)) ∈EH. • Such a function F is said to be an isomorphic correspondence.
Subgraph Isomorphism Problem • The subgraph problem is NP-complete. • There is a very simple reduction: CLIQUE ≤P Subgraph Isomorphism • However, for many specific types of practical problems (even with ‘big’ inputs), algorithms do answer fast.
The Ullman Algorithm • An Algorithm for Subgraph Isomorphism, J. R. Ullmann, Journal of the ACM, 1976. • Although old, this algorithm is still very popular and having good results in practice.
The Ullman Algorithm • There are algebraic formulations for graph isomorphism and subgraph isomorphism, that we will take use of. • The adjacency matrix AH of a graph H would be:
The Ullman Algorithm • We will use the notion ofa permutation matrix. • Any permutation matrix is equivalent to an isomorphic correspondence. Isomorphic Correspondence Permutation Matrix - - - - M’= F= F~M’
The Ullman Algorithm • Two graphs and are isomorphic with a correspondence F is similar to , and the similarity matrix is M’~F. Isomorphic Correspondence Permutation Matrix ~ - - - - Isomorphism criterion: M’= F= iff is isomorphic to H, with a correspondence F~M’. F~M’
The Ullman Algorithm • We can develop this equation that defines an isomorphism: Since is a symmetric matrix Since M’ is an orthonormal matrix, thus =I Isomorphism criterion: iff is isomorphic to H, with a correspondence F~M’.
The Ullman Algorithm • In a similar fashion (without proof) we have an algebraic criterion for a subgraphisomorphism. Isomorphic Correspondence Permutation Matrix ~ 1G-1H 2G-3H 3G-2H 4G-φ Subgraph isomorphism criterion: M’= F= iff G is subgraph isomorphic to H, with a correspondence F~rectangularM’.
The Ullman Algorithm • We have a graph G and a graph H, and we want to know if G is subgraph isomorphic to H . • So, We’ll search for a permutation matrix M* of size |x || that satisfies the subgraph isomorphism criterion. • We will enumerate over candidate permutation matrices of the same size, denoting a candidate by M’, from a set of candidates that satisfies: (The set of all M*-s) (The set of all M’-s) . During the enumeration, we check the isomorphism criterion over each candidate. If a candidate satisfies the criterion, we will return ‘yes’. If we would not find such a candidate, we will return ‘no’.
The Ullman Algorithm • Ullmann’s algorithm I • Construction of another matrix M(0) with the same size of the M’-s: • Generation of all M’-s by setting to 0all but one 1 in each row of M(0) • A subgraph isomorphism has been found if M implies: .
The Ullman Algorithm Root - M(0) • Ullmann’s algorithm I • Example Inner Nodes – M-s Leaves – M’-s
The Ullman Algorithm • Ullman’s algorithm II • Construction of another matrix M(0) with the same size of the M’-s: • Generation of all M‘-s by setting to 0all but one 1 in each row of M(0) . However, in this version, we will also prune all inner nodes M-s that have at least one 1 entry that doesn‘t comply with the refinement rule (to be defined). We are guaranteed to end up with the right answer since we still hold: (The set of all M*-s) (The set of all M’-s) • A subgraph isomorphism has been found if there is M‘ that satisfies .
The Ullman Algorithm • Ullmann’s refinement rule for prunning the search tree: • Observation: • If a vertex of G, , corresponds to a vertex of H, , then for each adjacent vertex of in G, denoted , there must be a vertex in H, denoted , in H that holds: • A. is adjacent to in H • B. corresponds to
The Ullman Algorithm • Algebraic notation: • For all mi,j=1 (that is already fixed): • Any inner node M that does not satisfy this rule is prunned, because all of its decendants are not M*-s.