Automatic Handwriting Transcription

Automatic Handwriting Transcription Adam Redford, Justin Dayley, Abdul Khalifa, Danny Perry

Outline Introduction Problem - automatic handwriting transcription online vs offline Previous Work Hidden Markov Model Recurrent Neural Networks etc Methods Recurrent Neural Network with Long-Short-Term Memory advantages disadvantages Results Data Sets Analysis Conclusion What we learned Future work

Introduction Handwriting Recognition • Ability of computer to receive and interpret intelligible handwritten input. • Generally divided into two different problems: • Offline Handwriting Recognition • Scanned images of handwriting are transcribed • Online Handwriting Recognition • Recorded pen strokes are transcribed • Possible to convert data back and forth • Algorithms to “trace” text from an image (offline becomes online) • Render given strokes as an imge (online becomes offline)

Introduction Previous Work • Recurrent Neural Networks with LSTM • Graves, et al. “Unconstrained Online Handwriting Recognition with Recurrent Neural Networks”, NIPS, 2007 • Hidden Markov Model (HMM) • Hu, et al. “HMM Based On-Line Handwriting Recognition”, PAMI, 1996 • SVM • Bahlmann, et al. “On-line Handwriting Recognition with Support Vector Machines - A Kernel Approach”, IWFHR, 2002

Methods Bidirectional Recurrent Neural Network with Long Short Term Memory (LSTM) * Graves, A. “Supervised Sequence Labelling with Recurrent Neural Networks”, PhD Thesis, 2008 • Bidirectional = can look forward (wait) and backward (remember) in time • Recurrent = hidden layers are connected to themselves • LSTM = explicit representation of memory , one way to approach the “vanishing gradient” problem

Methods Bidirectional RNN with LSTM Advantages • No feature engineering • No character segmentation • Learns temporal relationships • Fast evaluation Disadvantages • Very slow training time (1-2 days) • Not very interpretable

Methods Bidirectional RNN with LSTM We used “RNNLIB” implementation • Authored by Alex Graves • PhD with Schmidhuber • Postdoc with Hinton • http://sourceforge.net/p/rnnl/wiki/Home/

Methods - Contribution 1. We generatedour own English and Math handwriting data. 2. How well it does perform on Math data?

Results - Datasets 1. IAM On-Line Handwriting Database • online, English data set 2. We generated test data from our own handwriting using a Wacom tablet • online, English data set 3. CROHME: Competition on Recognition of Handwritten Mathematical Expressions • online, Math data set 4. We generated Math test data as well • online, Math data set

Results - Data sets Each line in the data set has: 1. ascii transcription, • ex: “Leaning back carries the concealed” 2. line stroke xml file • Contains several “strokes” • a stroke contains: • x,y locations of pen • time points for each x,y • if you plot them you get this (color = stroke):

Results - Data sets English Dataset 1: English We used 12,195 total lines of online stroke data • 10,195 for training • 1,000 for cross-validation (avoid overfitting) • 1,000 for testing Training time: 1 day 9 hours

Results - Data sets English Dataset 1: English Testing Data: • Label Error: 17.55 % • Sequence Error: 94.7% • Insertions: 1.07 % • Deletions: 4.35 %

Results - Data sets English Dataset 1: English Testing Data: • Label Error: 17.24 % • CTC Error: 17.55 • Sequence Error: 94.70% • Insertions: 1.07 % • Deletions: 4.35 % Label and CTC Error: • Label: How many individual character labels were wrong • CTC: How many are wrong after applying a simple language model Sequence Error: • If any label in an entire sequence is wrong, that sequence error is 100% • we got 5% of sequences 100% correct! Insertions and Deletions: • Measures how far result was from truth

Results Online English: testing example Provided transcription: “Y o u * w a n t * y o u r * p o n y , * m i s t e r ?” RNN Output: “y o u w a n t y * o u * p o n y * h u i s t e r * ?”

Results Online English: testing example Provided transcription: “Y o u * w a n t * y o u r * p o n y , * m i s t e r ?” RNN Output: “y o u w a n t y * o u * p o n y * h u i s t e r * ?” “m” looks kind of like an “hi”

Results Online English: testing example Provided transcription: “w * s * a l w a y s * f a s c i n a t e d * b y * t h e” RNN Output: “w s a l * w a y s f a s a i v a t e d * b y * t h e”

Results Online English: testing example Provided transcription: “w * s * a l w a y s * f a s c i n a t e d * b y * t h e” RNN Output: “w s a l * w a y s f a s a i v a t e d * b y * t h e” “cina” looks kind of like an “aiva”

Results - Data sets Dataset 2: Generated Online English

Results - Data sets Dataset 2: Generated Online English and Math • Used the network weights trained on the IAM On-Line Handwritten data set. • Converted our generated data into a similar format • Evaluated the NN on the new data

Results Generated Online English Provided transcription: “M a c h i n e * l e a r n i n g * r o c k s .” RNN Output: “M i a c h i n e * l e a r n m i n g * r o c k s .”

Results Generated Online English Provided transcription: “M a c h i n e * l e a r n i n g * r o c k s .” RNN Output: “M i a c h i n e * l e a r n m i n g * r o c k s .” introduced an extra “m” and “i”

Results Generated Online English Provided transcription: “T h e * s e m e s t e r * i s * o v e r .” RNN Output: “t h e * s e m e s t e r s * o v e r .”

Results Generated Online English Provided transcription: “T h e * s e m e s t e r * i s * o v e r .” RNN Output: “t h e * s e m e s t e r s * o v e r .” Did really well on this example introduced abbreviation instead of “is”

Results Generated Online English Provided transcription: “E l v i s * h a s * l e f t * t h e * b u i l d i n g” RNN Output: “I t s * h a s t e f t * t r e * b u l d " g”

Results Generated Online English Provided transcription: “E l v i s * h a s * l e f t * t h e * b u i l d i n g” RNN Output: “I t s * h a s t e f t * t r e * b u l d " g” Did very poorly on this example “Elvis” not observed in training set?

Results Generated Online English Provided transcription: “I * L i k e * c h e e s e” RNN Output: “I L i k e c h e e s e”

Results Generated Online English Provided transcription: “I * L i k e * c h e e s e” RNN Output: “I L i k e c h e e s e” Did very well Just missing spaces in the result.

Results Online Math Dataset 3: Online Math Data set • 1236 math expressions for training • 100 for cross-validation (avoid overtraining) • 486 for testing Training time: 13.5 hours Sample:

Results Online Math: Overall Results Test set: • Label Error: 69.2% • CTC Error: 55.1% • Sequence Error: 95.8% • Deletions: 27.7% • Insertions: 0.7%

Results Online Math: Overall Results Test set: • Label Error: 69.2% • CTC Error: 55.1% • Sequence Error: 95.8% • Deletions: 27.7% • Insertions: 0.7% Not that great...

Results Online Math: Testing Dataset Provided transcription: “( z 2 - n + y 2 - n ) ( z n - 2 - y n - 2 ) = x” RNN Output: “( z 2 - n + y 2 - n ) ( z n - 2 - n 2 - ) = x”

Results Online Math: Testing Dataset Provided transcription: “( z 2 - n + y 2 - n ) ( z n - 2 - y n - 2 ) = x” RNN Output: “( z 2 - n + y 2 - n ) ( z n - 2 - n 2 - ) = x” Did pretty well But messed up this sequence

Results Online Math: Testing Dataset Provided transcription: “\sin x - \sin y - \sin ( x - y )” RNN Output: “\sin x - \sin y - \sin ( x - y )”

Results Online Math: Testing Dataset Provided transcription: “\sin x - \sin y - \sin ( x - y )” RNN Output: “\sin x - \sin y - \sin ( x - y )” Perfect!

Results Online Math: Testing Dataset Provided transcription: “( a 2 + b 2 ) ( c 2 + d 2 ) \geq ( a c + b d ) 2” RNN Output: “\sin x - - z - - 2”

Results Online Math: Testing Dataset Provided transcription: “( a 2 + b 2 ) ( c 2 + d 2 ) \geq ( a c + b d ) 2” RNN Output: “\sin x - - z - - 2” Completely Wrong!

Results Generated Online Math Dataset 4: Generated Online Math Data set • Used the network weights trained on the CROHME data set. • Converted our generated data into a similar format • Evaluated the NN on the new data

Results Online Math: Testing Dataset Provided transcription: “\log 2” RNN Output: “b 2 -”

Results Online Math: Testing Dataset Provided transcription: “\log 2” RNN Output: “b 2 -” Didn’t do very well but it got the 2!

Results Online Math: Testing Dataset Provided transcription: “e = m c 2” RNN Output: “- 2”

Results Online Math: Testing Dataset Provided transcription: “e = m c 2” RNN Output: “- 2” Didn’t do very well again but very good at finding 2’s!

Conclusion - to do for report • Make a more careful comparison of our math data and the training math data set • There appear to be some specific differences precluding a successful transcription • Re-run the NN on our generated Math set

Conclusion - what we learned • RNN does pretty good on most of our handwriting • not quite ready for auto-transcription of notebooks without significant editing • RNN may not be the best approach for Math • May need some help in correlating the 2D positions of some symbols?

Conclusion Future work • Apply Multidimensional RNN on Math transcription • MDRNN: Instead of only using sequential data, use 2+ dimensional data • Graves, et al. applied it to 2D image data with success • It would be interesting to see if this improved the Math transcription - where some symbols require 2D location • Direct comparison to non-NN approaches • Few papers give a direct comparison to a feature-engineered and SVM (or similar) approach.

Automatic Handwriting Transcription

Automatic Handwriting Transcription

Presentation Transcript

Handwriting

Automatic phonetic transcription of large speech corpora

Handwriting

Automatic Speech Attribute Transcription (ASAT)

Handwriting Analysis

Real-Time Automatic Music Transcription

Handwriting

Handwriting Analysis

Handwriting Analysis

Handwriting

Handwriting Analysis

Handwriting Analysis

Handwriting

Handwriting

Automatic Transcription Reconstruction System (ATRS)

Automatic Transcription System of Kashino et al.

Handwriting

Handwriting-Olympiad

Handwriting

Handwriting

Handwriting Analysis

Handwriting