150 likes | 355 Views
Microsoft’s Cursive Recognizer. Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team jpittman@microsoft.com. The Handwriting Recognition Team. An experiment: A research group, but not housed in MSR Positioned inside a product group
E N D
Microsoft’s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team jpittman@microsoft.com MicrosoftTablet PC
The Handwriting Recognition Team • An experiment: • A research group, but not housed in MSR • Positioned inside a product group • Our direction and inspiration come directly from the users • This isn’t for everyone, but we like it • Just over a dozen researchers • Half with PhDs • Mostly CS, but 1 Chemistry, 1 Industrial Engineering, 1 Math, 1 Speech • Mostly neural network researchers • Small to moderate experience in other recognition technologies MicrosoftTablet PC
Neural Network Review 1.0 -2.3 1.4 1.0 0.1 -0.1 0.6 0.0 0.0 0.8 -0.8 0.0 0.7 • Directed acyclic graph • Nodes and arcs, each containing a simple value • Nodes contain activations, arcs contain weights • At run-time, we do a “forward pass” which computes activation from inputs to hiddens, and then to outputs • From the outside, the application only sees the input nodes and output nodes • Node values (in and out) range from 0.0 to 1.0 MicrosoftTablet PC
TDNN: Time Delayed Neural Network item 6 item 4 item 5 item 1 item 2 item 3 item 1 • This is still a normal back-propagation network • All the points in the previous slide still apply • The difference is in the connections • Connections are limited • Weights are shared • The input is segmented, and the same features are computed for each segment • Small detail: edge effects • For the first two and last two columns, the hidden nodes and input nodes that reach outside the range of our input receive zero activations MicrosoftTablet PC
Training • We use back-propagation training • We collect millions of words of ink data from thousands of writers • Young and old, male and female, left handed and right handed • Natural text, newspaper text, URLs, email addresses, street addresses • We collect in nearly two dozen languages around the world • Training on such large databases takes weeks • We constantly worry about how well our data reflect our customers • Their writing styles • Their text content • We can be no better than the quality of our training sets • And that goes for our test sets too MicrosoftTablet PC
Languages • We ship now in: • English (US), English (UK), French, German, Spanish, Italian • We have done some initial work in: • Dutch, Portuguese, Swedish, Danish, Norwegian, Finnish • We cannot predict when we might ship these • Are starting initial research in more • Using a completely different approach, we also ship now in: • Japanese, Chinese (Simplified), Chinese (Traditional), Korean MicrosoftTablet PC
Recognizer Architecture Ink Segments Top 10 List TDNN dog 68 clog 57 dug 51 doom 42 Output Matrix divvy 37 a 88 8 68 22 63 57 4 Lexicon ooze 35 b … 23 4 61 44 57 57 4 Beam Search … … cloy 34 a d g 57 a 88 … o 92 81 51 9 47 20 14 g doxy 29 e o 65 b 13 31 8 2 14 3 3 l b 23 t 12 b t … client 22 l 76 c b 6 g c 86 a 71 12 52 8 79 90 90 t dozy 13 a h a 73 d 17 17 5 7 43 13 7 t 5 o d 92 … g … e o 77 n … 7 18 57 28 57 6 5 g 68 t o 53 16 79 91 44 15 12 t 8 MicrosoftTablet PC
Language Model • We get better recognition if we bias our interpretation of the output matrix with a language model • Better recognition means we can handle sloppier cursive • You can write faster, in a more relaxed manner • The lexicon (system dictionary) is the main part • But there is also a user dictionary • And there are regular expressions for things like dates and currency amounts • We want a generator • We ask it: “what characters could be next after this prefix?” • It answers with a set of characters • We still output the top letter recognitions • In case you are writing a word out-of-dictionary • You will have to write more neatly MicrosoftTablet PC
Clumsy lexicon Issue • The lexicon includes all the words in the spellchecker • The spellchecker includes obscenities • Otherwise they would get marked as misspelled • But people get upset if these words are offered as corrections for other misspellings • So the spellchecker marks them as “restricted” • We live in an apparently stochastic world • We will throw up 6 theories about what you were trying to write • If your ink is near an obscene word, we might include that • Dilemma: • We want to recognizer your obscene word when you write it • Otherwise we are censoring, which is NOT our place • We DON’T want to offer these outputs when you don’t write them • Solution (weak): • We took these words out of the lexicon • You can still write them, because you can write out-of-dictionary • But you have to write very neat cursive, or nice handprint • Only works at the word level • Can’t remove words with dual meanings • Can’t handle phrases that are obscene when the individual words are not MicrosoftTablet PC
Regular Expressions • Many built-in, callable by ISVs, web pages • Number, date, time, currency amount, phone number, address, URL, email address, file name, phrase list • Many components of the above: • Month, day of month, day of week, year, area code, hour, minute • Isolated characters: • Digit, lowercase letter, uppercase letter • None: • Yields an out-of-dictionary-only system (turns off the language model) • Great for form-filling apps and web pages • Accuracy is greatly improved • This is in addition to the ability to load the user dictionary • One could load 500 color names for a color field in a form-based app • Or 8000 drug names in a prescription app • The regular expression compiler is available at run time • Software vendors can add their own regular expressions • One could imagine the DMV adding automobile VINs • Example expressions (from the built-in date format): • digit = "0123456789"; • nummonth = ["0"] "123456789" | "1" "012"; • numday = ["0"] "123456789" | "12" digit | "3" "01"; • numyear = [ "12" digit ] digit digit ; • numyear = "'" digit digit; • numdate = nummonth "/" numday ["/" [ "12" digit ] digit digit]; • numdate = nummonth "-" numday ["-" [ "12" digit ] digit digit]; MicrosoftTablet PC
Default Factoid • Used when no factoid is set • Intended for natural text, such as the body of an email • Includes system dictionary, user dictionary, hyphenation rule, number grammar, web address grammar • All wrapped by optional leading punctuation and trailing punctuation • Hyphenation rule allows sequence of dictionary words with hyphens between • Alternatively, can be a single character (any character supported by the system) SysDict UserDict Leading Punc Hyphenation Trailing Punc Start Final Number Web Single Char MicrosoftTablet PC
Error Correction: SetTextContext() Goal: Better context usage for error correction scenarios • User writes “Dictionary” • Recognizer misrecognizes it as “Dictum” • User selects “um” and rewrites “ionary” • TIP notes partial word selection, puts recognizer into correction mode with left and right context • Beam search artificially recognizes left context • Beam search runs ink as normal • Beam search artificially recognizes right context • This produces “ionary” in top 10 list; TIP must insert this to the right of “Dict” 1. Dictum 2. Dictum 3. 4. Right Context Left Context “Dict” “” a 0 b 0 e 0 a 57 c 0 c 100 t 100 i 85 i 100 d 100 o 72 6. n 5 a 0 5. 7. MicrosoftTablet PC
Calligrapher • The Russian recognition company Paragraph sold itself to SGI (Silicon Graphics, Incorporated), who then sold it to Vadem, who sold it to Microsoft. • In the purchase we obtained: • Calligrapher • Cursive recognizer that shipped on the first Apple Newton (but not the second) • Transcriber • Handwriting app for handheld computers (shipped on PocketPC) • Calligrapher has a very similar architecture • Instead of a TDNN it employs a hand-built HMM • The lexicon and beam search similar in nature (many small differences) • We combined our system with Calligrapher • We use a voting system (neural nets) to combine each recognizer’s top 10 list • They are very different, and make different mistakes • We get the best of both worlds • If either recognizer outputs a single-character “word” we forget these lists and run the isolated character recognizer MicrosoftTablet PC
Personalization • Ink shape personalization • Simple concept: just do same training on this customer’s ink • Start with components already trained on massive database of ink samples • Train further on specific user’s ink samples • Explicit training • User must go to a wizard and copy a short script • Do have labels from customer • Limited in quantity, because of tediousness • Implicit training • Data is collected in the background during normal use • Doesn’t have labels from customer • We must assume correctness of our recognition result using our confidence measure • We get more data • Much of the work is in the infrastructure: • GUI, database, management of different user’s trained networks, etc. • Lexicon personalization: Harvesting • Simple concept: just add the user’s new words to the lexicon • Examples (at Microsoft): RTM, dev, SDET, dogfooding, KKOMO, featurization • Happens when correcting words in the TIP • Also scan Word docs and outgoing email (avoid spam) MicrosoftTablet PC
Best Job at Microsoft • Bill Gates makes more money, but I have more fun • No one hassles me for money or slots • I remember senior people at several research institutions saying “waste of time and money” • Insert here • I still have a sense of wonder that it works at all • It’s as if your dog starting talking to you • People tell me it recognizes their writing when no one else can • But I also know there are others who get poor recognition • I wonder if Gary Trudeau has tried it • People will adapt to a recognizer, if they use it enough • Just as they adapt to the people they live with and work with • My physician in Issaquah gets perfect recognition on a Newton • Biggest complaint: we don’t yet ship their language • Other complaints: • Weak on URLs, email addresses, slashes • Some handprint gets poor recognition • Adaptation to my handwriting style (coming) Raspberry MicrosoftTablet PC