Whole-Book Recognition using Mutual-Entropy-Driven Model Adaptation

Whole-Book Recognition using Mutual-Entropy-Driven Model Adaptation Pingping Xiu* Henry S. Baird DR&R XVJanuary 29, 2008

Whole-Book Recognition Motivation • Millions of books are being scanned cover-to-cover with OCR results available on the web. • Many documents are strikingly isogenous, containing only a small fraction of all possible languages, words, typefaces, image qualities, layout styles, etc. • Research in the last decade or so shows that isogeny can be exploited to improve recognition fully automatically. ( Nagy & Baird (1994), Hong (1995), Sarkar (2000), Veeramachaneni (2002), Sarkar, Baird & Zhang, (2003) ) Given: a book’s images, an initial buggy OCR transcription, & a dictionary that might be incomplete … can we: improve recognition accuracy substantially & fully automatically?

Fully-Automatic Model Adaptation Two models: initialized from the input iconic: a character-image classifier linguistic: a dictionary (list of valid words) Look for disagreements between the two models: measure mutual entropy between two probability distributions: • a posteriori probabilities of character classes • a posteriori probabilities of word classes Automatically correct both models to reduce disagreement: • We illustrate this approach on a small test set of real book-image data. • We show that we can drive improvements in both models, and can lower recognition error rate.

Defining ‘Disagreements’ Mutual Entropy (ME): Character-level ME: Word-level ME: Passage-level ME:

Model-Adaptation Policies • How to correct the iconic model: characters with largest character-level disagreement first • How to correct the linguistic model: words with largest word-level disagreement first • Minimize this objective function: drive passage-level mutual entropy down • Test experimentally: • can this policy achieve unsupervised improvement? • does it work better, the longer the passage length?

Experimental Design – Training Training Image with its hOCR transcript (*) A dictionary containing all the true words in that page, except “the” (intentional damage) Iconic Model: Linguistic Model: * Google Inc., Book Search Dataset, Version V1.0, Volume 0, Page 28, Aug, 2007

Experimental Design – Testing • Character classification: highly erroneous • Word classification: not too bad, but all “the”s are missing.

Interpreting Disagreements • What to do next? Apply a series of iconic corrections. • The series of model correction: ‘t’ (wd 7 ch 1), ‘3’ (wd 8 ch 3), ‘t’ (wd 9 ch 4), etc.

Effects of Iconic Adaptation Improved state, after iconic model corrections: Initial state, after inferring models from the input:

Iconic Model Improves (a) Before the sequence of iconic model corrections: 1st & 2nd choices aren’t very different  low confidence • After iconic model corrections: 1st & 2nd choices differ much more  higher confidence

Effects on Disagreement Distribution • Before iconic adaptation, disagreements are scattered throughout the passage. • After iconic adaptation, disagreements concentrate on errors due to defects in the linguistic model.

Correcting the Linguistic Model • First, examine the word having the highest word-level ME: cluster (3,8,11) • Voting suggests that the correct word interpretation is ‘the’ -- which is then inserted into the dictionary, fully automatically.

Algorithm & Results Summary • Unsupervised (fully automatic). • Greedy: multiple cycles, each cycle with two stages: iconic changes, then linguistic changes. • Character-level ME suggests priorities for iconic model changes. • Word-level ME suggests priorities for linguistic model changes. • Every model change must reduce passage-level ME. • Stopping rule: we’ve experimented with an adaptive threshold heuristic. • In the real-data example in the paper, recognition error rate was driven to zero---of course, this is a very short passage.

Isogeny & Long Passages • If, as we expect, isogeny plays a crucial role, error rates should decline as passage-length increases. • A recent experiment (not in paper): a whole page and a 45k-word dictionary:

Discussion • Mutual Entropy is only one information-theoretic framework for unsupervised model improvement • Even using ME, other interesting greedy & branch-&-bound heuristics come to mind • Minimizing passage-level ME may not necessarily minimize recognition error (experts disagree) • Proofs of global optimality seem hard: but we’re investigating…. • Scaling up to, say, 100-page passages is urgently needed

Thanks! Pingping Xiu pix206@lehigh.edu Henry Baird baird@cse.lehigh.edu

Whole-Book Recognition using Mutual-Entropy-Driven Model Adaptation

Whole-Book Recognition using Mutual-Entropy-Driven Model Adaptation

Presentation Transcript

Name Entity Recognition System using Maximum Entropy Model

Roy Adaptation Model

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

Maximum Entropy Model (I)

Model-Driven Energy-Aware Rate Adaptation

Maximum Entropy Model (I)

Model-Driven Integration Using CIM

Maximum Entropy Model (II)

Pragmatic Model Driven Development (MDD) using openArchitectureWare

§1 Entropy and mutual information

Maximum Entropy Model

Community Driven Adaptation

Model-Driven Aspect Adaptation To Support Modular Software Evolution

Whole-Part-Whole Learning Model

Maximum Entropy Model

Stratal Slicing using 3D Model-driven Chronostratigraphy

MAXIMUM ENTROPY MARKOV MODEL

Community-Driven Adaptation

Name Entity Recognition System using Maximum Entropy Model

conditional entropy, mutual information

Maximum Entropy Model (II)

Mutual Recognition