520 likes | 914 Views
Automatic Handwriting Transcription. Adam Redford, Justin Dayley, Abdul Khalifa, Danny Perry. Outline. Introduction Problem - automatic handwriting transcription online vs offline Previous Work Hidden Markov Model Recurrent Neural Networks etc Methods
E N D
Automatic Handwriting Transcription Adam Redford, Justin Dayley, Abdul Khalifa, Danny Perry
Outline Introduction Problem - automatic handwriting transcription online vs offline Previous Work Hidden Markov Model Recurrent Neural Networks etc Methods Recurrent Neural Network with Long-Short-Term Memory advantages disadvantages Results Data Sets Analysis Conclusion What we learned Future work
Introduction Handwriting Recognition • Ability of computer to receive and interpret intelligible handwritten input. • Generally divided into two different problems: • Offline Handwriting Recognition • Scanned images of handwriting are transcribed • Online Handwriting Recognition • Recorded pen strokes are transcribed • Possible to convert data back and forth • Algorithms to “trace” text from an image (offline becomes online) • Render given strokes as an imge (online becomes offline)
Introduction Handwriting Recognition • Ability of computer to receive and interpret intelligible handwritten input. • Generally divided into two different problems: • Offline Handwriting Recognition • Scanned images of handwriting are transcribed • Online Handwriting Recognition • Recorded pen strokes are transcribed • Possible to convert data back and forth • Algorithms to “trace” text from an image (offline becomes online) • Render given strokes as an imge (online becomes offline)
Introduction Previous Work • Recurrent Neural Networks with LSTM • Graves, et al. “Unconstrained Online Handwriting Recognition with Recurrent Neural Networks”, NIPS, 2007 • Hidden Markov Model (HMM) • Hu, et al. “HMM Based On-Line Handwriting Recognition”, PAMI, 1996 • SVM • Bahlmann, et al. “On-line Handwriting Recognition with Support Vector Machines - A Kernel Approach”, IWFHR, 2002
Introduction Previous Work • Recurrent Neural Networks with LSTM • Graves, et al. “Unconstrained Online Handwriting Recognition with Recurrent Neural Networks”, NIPS, 2007 • Hidden Markov Model (HMM) • Hu, et al. “HMM Based On-Line Handwriting Recognition”, PAMI, 1996 • SVM • Bahlmann, et al. “On-line Handwriting Recognition with Support Vector Machines - A Kernel Approach”, IWFHR, 2002
Methods Bidirectional Recurrent Neural Network with Long Short Term Memory (LSTM) * Graves, A. “Supervised Sequence Labelling with Recurrent Neural Networks”, PhD Thesis, 2008 • Bidirectional = can look forward (wait) and backward (remember) in time • Recurrent = hidden layers are connected to themselves • LSTM = explicit representation of memory , one way to approach the “vanishing gradient” problem
Methods Bidirectional RNN with LSTM Advantages • No feature engineering • No character segmentation • Learns temporal relationships • Fast evaluation Disadvantages • Very slow training time (1-2 days) • Not very interpretable
Methods Bidirectional RNN with LSTM We used “RNNLIB” implementation • Authored by Alex Graves • PhD with Schmidhuber • Postdoc with Hinton • http://sourceforge.net/p/rnnl/wiki/Home/
Methods - Contribution 1. We generatedour own English and Math handwriting data. 2. How well it does perform on Math data?
Results - Datasets 1. IAM On-Line Handwriting Database • online, English data set 2. We generated test data from our own handwriting using a Wacom tablet • online, English data set 3. CROHME: Competition on Recognition of Handwritten Mathematical Expressions • online, Math data set 4. We generated Math test data as well • online, Math data set
Results - Data sets Each line in the data set has: 1. ascii transcription, • ex: “Leaning back carries the concealed” 2. line stroke xml file • Contains several “strokes” • a stroke contains: • x,y locations of pen • time points for each x,y • if you plot them you get this (color = stroke):
Results - Data sets English Dataset 1: English We used 12,195 total lines of online stroke data • 10,195 for training • 1,000 for cross-validation (avoid overfitting) • 1,000 for testing Training time: 1 day 9 hours
Results - Data sets English Dataset 1: English Testing Data: • Label Error: 17.55 % • Sequence Error: 94.7% • Insertions: 1.07 % • Deletions: 4.35 %
Results - Data sets English Dataset 1: English Testing Data: • Label Error: 17.55 % • Sequence Error: 94.7% • Insertions: 1.07 % • Deletions: 4.35 %
Results - Data sets English Dataset 1: English Testing Data: • Label Error: 17.24 % • CTC Error: 17.55 • Sequence Error: 94.70% • Insertions: 1.07 % • Deletions: 4.35 % Label and CTC Error: • Label: How many individual character labels were wrong • CTC: How many are wrong after applying a simple language model Sequence Error: • If any label in an entire sequence is wrong, that sequence error is 100% • we got 5% of sequences 100% correct! Insertions and Deletions: • Measures how far result was from truth
Results Online English: testing example Provided transcription: “Y o u * w a n t * y o u r * p o n y , * m i s t e r ?” RNN Output: “y o u w a n t y * o u * p o n y * h u i s t e r * ?”
Results Online English: testing example Provided transcription: “Y o u * w a n t * y o u r * p o n y , * m i s t e r ?” RNN Output: “y o u w a n t y * o u * p o n y * h u i s t e r * ?” “m” looks kind of like an “hi”
Results Online English: testing example Provided transcription: “w * s * a l w a y s * f a s c i n a t e d * b y * t h e” RNN Output: “w s a l * w a y s f a s a i v a t e d * b y * t h e”
Results Online English: testing example Provided transcription: “w * s * a l w a y s * f a s c i n a t e d * b y * t h e” RNN Output: “w s a l * w a y s f a s a i v a t e d * b y * t h e” “cina” looks kind of like an “aiva”
Results - Data sets Dataset 2: Generated Online English
Results - Data sets Dataset 2: Generated Online English and Math • Used the network weights trained on the IAM On-Line Handwritten data set. • Converted our generated data into a similar format • Evaluated the NN on the new data
Results Generated Online English Provided transcription: “M a c h i n e * l e a r n i n g * r o c k s .” RNN Output: “M i a c h i n e * l e a r n m i n g * r o c k s .”
Results Generated Online English Provided transcription: “M a c h i n e * l e a r n i n g * r o c k s .” RNN Output: “M i a c h i n e * l e a r n m i n g * r o c k s .” introduced an extra “m” and “i”
Results Generated Online English Provided transcription: “T h e * s e m e s t e r * i s * o v e r .” RNN Output: “t h e * s e m e s t e r s * o v e r .”
Results Generated Online English Provided transcription: “T h e * s e m e s t e r * i s * o v e r .” RNN Output: “t h e * s e m e s t e r s * o v e r .” Did really well on this example introduced abbreviation instead of “is”
Results Generated Online English Provided transcription: “E l v i s * h a s * l e f t * t h e * b u i l d i n g” RNN Output: “I t s * h a s t e f t * t r e * b u l d " g”
Results Generated Online English Provided transcription: “E l v i s * h a s * l e f t * t h e * b u i l d i n g” RNN Output: “I t s * h a s t e f t * t r e * b u l d " g” Did very poorly on this example “Elvis” not observed in training set?
Results Generated Online English Provided transcription: “I * L i k e * c h e e s e” RNN Output: “I L i k e c h e e s e”
Results Generated Online English Provided transcription: “I * L i k e * c h e e s e” RNN Output: “I L i k e c h e e s e” Did very well Just missing spaces in the result.
Results Online Math Dataset 3: Online Math Data set • 1236 math expressions for training • 100 for cross-validation (avoid overtraining) • 486 for testing Training time: 13.5 hours Sample:
Results Online Math: Overall Results Test set: • Label Error: 69.2% • CTC Error: 55.1% • Sequence Error: 95.8% • Deletions: 27.7% • Insertions: 0.7%
Results Online Math: Overall Results Test set: • Label Error: 69.2% • CTC Error: 55.1% • Sequence Error: 95.8% • Deletions: 27.7% • Insertions: 0.7% Not that great...
Results Online Math: Testing Dataset Provided transcription: “( z 2 - n + y 2 - n ) ( z n - 2 - y n - 2 ) = x” RNN Output: “( z 2 - n + y 2 - n ) ( z n - 2 - n 2 - ) = x”
Results Online Math: Testing Dataset Provided transcription: “( z 2 - n + y 2 - n ) ( z n - 2 - y n - 2 ) = x” RNN Output: “( z 2 - n + y 2 - n ) ( z n - 2 - n 2 - ) = x” Did pretty well But messed up this sequence
Results Online Math: Testing Dataset Provided transcription: “\sin x - \sin y - \sin ( x - y )” RNN Output: “\sin x - \sin y - \sin ( x - y )”
Results Online Math: Testing Dataset Provided transcription: “\sin x - \sin y - \sin ( x - y )” RNN Output: “\sin x - \sin y - \sin ( x - y )” Perfect!
Results Online Math: Testing Dataset Provided transcription: “( a 2 + b 2 ) ( c 2 + d 2 ) \geq ( a c + b d ) 2” RNN Output: “\sin x - - z - - 2”
Results Online Math: Testing Dataset Provided transcription: “( a 2 + b 2 ) ( c 2 + d 2 ) \geq ( a c + b d ) 2” RNN Output: “\sin x - - z - - 2” Completely Wrong!
Results Generated Online Math Dataset 4: Generated Online Math Data set • Used the network weights trained on the CROHME data set. • Converted our generated data into a similar format • Evaluated the NN on the new data
Results Online Math: Testing Dataset Provided transcription: “\log 2” RNN Output: “b 2 -”
Results Online Math: Testing Dataset Provided transcription: “\log 2” RNN Output: “b 2 -” Didn’t do very well but it got the 2!
Results Online Math: Testing Dataset Provided transcription: “e = m c 2” RNN Output: “- 2”
Results Online Math: Testing Dataset Provided transcription: “e = m c 2” RNN Output: “- 2” Didn’t do very well again but very good at finding 2’s!
Conclusion - to do for report • Make a more careful comparison of our math data and the training math data set • There appear to be some specific differences precluding a successful transcription • Re-run the NN on our generated Math set
Conclusion - what we learned • RNN does pretty good on most of our handwriting • not quite ready for auto-transcription of notebooks without significant editing • RNN may not be the best approach for Math • May need some help in correlating the 2D positions of some symbols?
Conclusion Future work • Apply Multidimensional RNN on Math transcription • MDRNN: Instead of only using sequential data, use 2+ dimensional data • Graves, et al. applied it to 2D image data with success • It would be interesting to see if this improved the Math transcription - where some symbols require 2D location • Direct comparison to non-NN approaches • Few papers give a direct comparison to a feature-engineered and SVM (or similar) approach.