1 / 34

Handwriting Synthesis for Human Interactive Proof in Web Services

Handwriting Synthesis for Human Interactive Proof in Web Services. CSE 717 Project – Progress Report Gabriel Terejanu. Outline. HIP Introduction CAPTCHA Challenges Guidelines Previous work in Handwritten CAPTCHA Proposal Problem Approach Conclusion. Human Identity Proof (HIP).

mizell
Download Presentation

Handwriting Synthesis for Human Interactive Proof in Web Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handwriting Synthesisfor Human Interactive Proof in Web Services CSE 717 Project – Progress Report Gabriel Terejanu

  2. Outline • HIP Introduction • CAPTCHA Challenges • Guidelines • Previous work in Handwritten CAPTCHA • Proposal • Problem Approach • Conclusion

  3. Human Identity Proof (HIP) • “Defend services from malicious attacks differentiating bots from human users”. [Zhang04] • Turing Test : 1950 Alan M. Turing • First idea for web: 1996 Moni Noar • First web implementation: 1997 Altavista

  4. Human Identity Proof (HIP) • Text-based: • Printed text: Gimpy (captcha.net) • Handwritten text: Handwritten CAPTCHA [Rusu04] • Non-text based: • Clock face (exploits common knowledge) - broken • Picture base (exploits understanding): person spot, count cars, tell weather • http://hotcaptcha.com/captcha :)

  5. CAPTCHA • Completely Automated Public Turing test to Tell Computers and Humans Apart • Manuel Blum - Carnegie Mellon University • Exploits human ability to read corrupted text

  6. CAPTCHA : Applications • Portals • Weblogs • IM, message boards • Social networking • Spam-filtering • Banking • Web-Ticket • Web-vote Very business oriented field

  7. CAPTCHA : Challanges • W3C : Inaccessibility of CAPTCHA (2005) • Sometimes very frustrating for “normal” people • Accessibility: blind, low vision, dyslexia, color blindness (lose business) • Increase cost of business to supplement for accessibility (hot lines) • False Security: • paid human operator – “Borrow (or rent) someone’s eyes” • Breaking a Visual CAPTCHA (EZ-Gimpy 92%, Gimpy 33%) [Greg Mori - UC Berkeley]

  8. CAPTCHA : Challenges (2) • Cognitive disabilities • Foreign languages • Static CAPTCHA dangerous • Variable font (http://sam.zoy.org/pwntcha)

  9. Dyslexia (www.dyslexia-teacher.com) • “dyslexia” comes from the Greek meaning “difficulty with words” • difficulties with spelling • confusion over left and right (b <-> d) • confusion over up and down ( p <-> 9) • writing letters or numbers backwards • difficulties with math/s • difficulty following 2- or 3-step instructions • 10-15% of the US population has dyslexia • (http://www.dyslexia-add.org) • 8-12% of males of European origin have color blindness

  10. “An explicitly inaccessible access control mechanism should not be promoted as a solution, especially when other systems exist that are not only more accessible, but may be more effective, as well.” [W3C05]

  11. Guidelines • Redefine the ability to read corrupted text • Easy to use • Low cost (small site mass usage) • Hard to solve (out of context) by a third person • Use understanding of the 1st grade • Very clean and well spelled text • Use of very light deformations allowed

  12. Importance of Handwriting Generation • CAPTCHA project • After the writer identification makes the handwriting recognition easier • Error correction for handwritten text • Adds personal touch to the communication • Create customized fonts (My Font Tool for Tablet PC, fontifier.com)

  13. Handwriting Synthesis • Movement simulation technique • Based on motor models • Usually accompanied by on-line acquisition • Shape simulation techniques • More practical when the dynamics of handwriting are not available • Off-line acquisition (easy to collect samples)

  14. Why Handwritten CAPTCHA ? • “As agreed by most researchers, it is impossible to achieve a correct ratio of 100% for handwriting recognition and segmentation.” [J.Wang2004]

  15. Handwritten CAPTCHA • Rusu and Govindaraju - HIP 2005 • Collet handwritten words and gluing • Original images public knowledge (city names from postal applications) • Gestalt laws of perception • Closure, similarity, proximity, symmetry, continuity, familiarity, figure and ground, memory • Random select transformations • Overlaps, occlusion, extra strokes, orientation,

  16. H-CAPTCHA (Room for improvement) • Accessibility • Some instances hard to read for “normal” people • Paid eye problem, easy to solve out of context

  17. Sequences of CAPTCHA’s • Leveraging the CAPTCHA Problem by Daniel Lopresti • Rely on digital libraries and transcripts • n CAPTCHA challenges • One decision CAPTCHA permit/deny access • The rest are used to label the challenges (assuming we have a human) -> promote to a decision CAPTCHA • No intermediate results are provided

  18. Sequences of CAPTCHA’s (2)

  19. Sequences of CAPTCHA’s (Room for improvement) • Time consuming • Might be complicated for people with dyslexia • Paid eye problem, easy to solve out of context • May be too static / predictable

  20. CAPTCHA Proposal • Combine ability to read handwritten text with 1st grade understanding of the text • Moderate complexity to each task (human) => very difficult to solve for machines • Single CAPTCHA image with at most 2 lines of handwritten text • Synthetic handwriting • vast amount of handwritten styles • Ligature generator • Controlled randomness • Sentence / QA generator • answer not necessary in CAPTCHA image • Include prior information • available in the web form fields • “Light” deformations – easy tolerable by humans

  21. Example Proposal • English • terejanu@buffalo.edu

  22. Sample Collection • Depended of the method • I.Guyon – glyphs: eq: port, sid, wil – 1 hr • J.Wang – words – rely on segmentation • Collect a series of sample for each character – no segmentation, easy scanable • A writer may have few different styles for the same character

  23. Landmarks extraction • Good choice: consistent from one image to another (high curvature, junctions) – for precision intermediate points • Mark by hand • J.Wang – series of 1-D Gabor filters • C.H.Teh – On the Detection of Dominant Points On Digital Curves • No parameters http://cg.scs.carleton.ca/~luc

  24. Character prototype creation • Point correspondence • Shape pose/scale differences • Create prototype • Extract variations from shape clustering • N.Duta – Automatic Construction of 2D Shape Models

  25. Random Character Generator • 1 character – 30 control points • Small perturbations not allowed • Preserve readability • Random generation in a simplex (uniform distributed)

  26. Curve Generation • Essential for high-resolution graphics • LeGrange interpolation • Smooth curve pass through a group of ordered control points • Blending Functions - thought of as a function specifying how much the ith control point draws the curve towards it • Curve wiggles between the control points • Corners at control points when connecting curves • B-Spline Curves • The curve does not pass through each control point, but instead just passes near them • Slopes between curve segments are continuous • Usually cubic B-splines are used • Bezier Curves • Pass through the first and last control points • http://web.cs.wpi.edu/~matt/courses/cs563/talks/curves.html

  27. Ligature Generator • Aesthetic quality • Optimization process • Limited centrifugal acceleration • Limited acceleration / retardation in the direction of the velocity vector • Works for variable spacing between characters • M.Kokula – Automatic generation of script font ligature based on curve smoothness optimization

  28. Variable width splines ? • R.V.Klassen - Variable width splines: a possible font representation? • centerline curve (spline) + width function (spline) • Control points + w (control width / scale factor) • Little experience desining characters width variable width splines • Storing burden & creation complexity

  29. Randomly Generated Sentences • http://www.manythings.org/rs/ • Subject + Verb • I swim. Joe swims. They swam. • Subject + Verb + Object • I drive a car. Joe plays the guitar. They ate dinner. • Subject + Verb + Complement • I am busy. Joe became a doctor. They look sick. • Subject + Verb + Indirect Object + Direct Object • I gave her a gift. She teaches us English. • Subject + Verb + Object + Complement • I left the door open. • We elected him president. • They named her Jane.

  30. Possible Generated Text • Sentence + Question => one word answer • I drove a car. • What did I drive ? => car • Sentence + Question (prior) • Your name is Gabriel Terejanu. • What is your zip code? => 14217 • Instructions (use with prior information) • Write again your email address. => terejanu@buffalo.edu • Etc… (help)

  31. Difficulties for Recognition Algorithms • Variety of handwritten styles • Random characters • Random spacing between characters • Ligatures • Variable width strokes ? • Huge lexicon • Prior information from the web form • Random spacing between words • Random sentence generator • Variable baselines for the words in a sentence • Maybe write on a curve / wave • Different handwritten style for the two sentences • Understanding engine

  32. Seems difficult for machines Accessible Easy to automate the process (collection, modulation …) Prior information (against paid eye) Foreign language Broking possibility –need variety in sentence formulation Pros / Cons

  33. Conclusion • Integrate the CAPTCHA generation process into a script font (Postscript Type3): random character, ligature paper • Handwriting is a characteristic task to humans that is difficult to reproduce using algorithms • Need first results • Test procedure ?

  34. A Sense of Success : Get the first Handwritten CAPTCHA in the W3C Reports as an better alternative for CAPTCHA

More Related