250 likes | 424 Views
Visual CAPTCHA with Handwritten Image Analysis. Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo. Background on CAPTCHA. Completely Automatic Public Turing test to tell Computers and Humans Apart – CAPTCHA CAPTCHA should be automatically generated and graded
E N D
Visual CAPTCHA with Handwritten Image Analysis Amalia Rusu and Venu Govindaraju CEDAR University at Buffalo
Background on CAPTCHA • Completely Automatic Public Turing test to tell Computers and Humans Apart – CAPTCHA • CAPTCHA should be automatically generated and graded • Tests should be taken quickly and easily by human users • Tests should accept virtually all human users and reject software agents • Tests should resist automatic attack for many years despite the technology advances and prior knowledge of algorithms • Exploits the difference in abilities between humans and machines (e.g., text, speech or facial features recognition) • A new formulation of the Alan Turing’s test - “Can machines think?”
The user initiates the The user initiate the dialog and has to be dialog and has to be Authentication Server Authentication Server authenticated by server authenticated by server User User Challenge Challenge User authentication User authentication Internet Internet Response Response Securing Cyberspace Using CAPTCHA Automatic Authentication Session for Web Services. • Initialization • Handwritten CAPTCHA Challenge • User Response • Verification
Objective Develop CAPTCHAs based on the ability gap between humans and machines in handwriting recognition using Gestalt laws of perception State-of-the-art in HR Speed and accuracy of a HR. Feature extraction time is excluded. Testing platform is an Ultra-SPARC. [Xue, Govindaraju 2002]
H-CAPTCHA Motivation • Machine recognition of handwriting is more difficult than printed text • Handwriting recognition is a task that humans perform easily and reliably • Several machine printed text based CAPTCHAs have been already broken • Greg Mori and Jitendra Malik of the UCB have written a program that can solve Ez-Gimpy with accuracy 83% • Thayananthan, Stenger, Torr, and Cipolla of the Cambridge vision group have written a program that can achieve 93% correct recognition rate against Ez-Gimpy • Gabriel Moy, Nathan Jones, Curt Harkless, and Randy Potter of Areté Associates have written a program that can achieve 78% accuracy against Gimpy-R • Speech/visual features based CAPTCHAs are impractical • H-CAPTCHAs thus far unexplored by the research community
H-CAPTCHA Challenges • Generation of random and ‘infinite many’ distinct handwritten CAPTCHAs • Quantifying and exploiting the weaknesses of state-of-the-art handwriting recognizers and OCR systems • Controlling distortion - so that they are human readable (conform to Gestalt laws) but not machine readable
Generation of random and infinite many distinct handwritten text images • Use handwritten word images that current recognizers cannot read • Handwritten US city name images available from postal applications • Collect new handwritten word samples • Create real (or nonsense) handwritten words and sentences by gluing isolated upper and lower case handwritten characters or word images
Generation of random and infinite many distinct handwritten text images • Use handwriting distorter for generating “human-like” samples • Models that change the trajectory/shape of the letter in a controlled fashion (e.g. Hollerbach’s oscillation model) Original handwritten image (a). Synthetic images (b,c,d,e,f).
Word Model Recognizer (WMR) Accuscript Lexicon Driven Model Distance between lexicon entry ‘word’ first character ‘w’ and the image between: - segments 1 and 4 is 5.0 - segments 1 and 3 is 7.2 - segments 1 and 2 is 7.6 Grapheme Based Model End Loops Junction • lexicon driven approach • chain code based image processing • pre-processing • segmentation • feature extraction • dynamic matching End • grapheme-based recognizer • extracts high-level structural features from characters such as loops, turns, junctions, arcs, without previous segmentation • uses a stochastic finite state automata model based on the extracted features • uses static lexicons in the recognition process w[5.0] o[7.7]r[5.8] r[7.6] d[4.9] o[6.1] Loop r[6.4] Turns w[5.0] o[6.0] r[7.5] o[8.3] w[7.6] o[7.6]r[6.3] 1 2 3 4 5 6 7 8 9 o[6.6] r[3.8] d[4.4] o[8.7]r[7.4] w[7.2] o[7.2] d[6.5] o[10.6] w[8.6] o[7.8]r[8.6] Find the best way of accounting for characters ‘w’, ‘o’, ‘r’, ‘d’ buy consuming all segments 1 to 8 in the process Exploit the Source of Errors for State-of-the-art Handwriting Recognizers [Kim, Govindaraju 1997] [Xue, Govindaraju 2002]
Source of Errors for State-of-the-art Handwriting Recognizers • Image quality Background noise, printing surface, writing styles • Image features Variable stroke width, slope, rotations, stretching, compressing • Segmentation errors Over-segmentation, merging, fragmentation, ligatures, scrawls • Recognition errors Confusion with a similar lexicon entries, large lexicons
Gestalt Laws • Gestalt psychology is based on the observation that we often experience things that are not a part of our simple sensations • What we are seeing is an effect of the whole event, not contained in the sum of the parts (holistic approach) • Organizing principles: Gestalt Laws • By no means restricted to perception only (e.g. memory)
Gestalt Laws 1. Law of closure 2. Law of similarity OXXXXXX XOXXXXX XXOXXXX XXXOXXX XXXXOXX XXXXXOX XXXXXXO 3. Law of proximity 4. Law of symmetry [ ][ ][ ] ************** ************** **************
Gestalt Laws 5. Law of continuity • Ambiguous segmentation • Segmentation based on good continuity, follows the path of minimal curvature change • Perceptually implausible segmentation 6. Law of familiarity • Ambiguous segmentation • Perceptual segmentation • Segmentation based on good continuity proves to be erroneous
Gestalt Laws 7. Figure and ground 8. Memory
Control Overlaps Gestalt laws: proximity, symmetry, familiarity, continuity, figure and ground Create horizontal or vertical overlaps For same word, smaller distance overlaps For different words, bigger distance overlaps
Control Occlusions Gestalt laws: closure, proximity, familiarity Add occlusions by circles, rectangles, lines with random angles Ensure small enough occlusions such that they do not hide letters completely
Control Occlusions Gestalt laws: closure, proximity, familiarity Add occlusions by waves from left to right on entire image, with various amplitudes / wavelength or rotate them by an angle Choose areas with more foreground pixels, on bottom part of the text image (not too low not to high)
Control Extra Strokes Gestalt laws: continuity, figure and ground, familiarity • Add occlusion using the same pixels as the foreground pixels (black pixels), arcs, or lines, with various thickness • Curved strokes could be confused with part of a character • Use asymmetric strokes such that the pattern cannot be learned
Control Letter/Word Orientation Gestalt laws: memory, internal metrics, familiarity of letters vertical mirror horizontal mirror flip-flop Change word orientation entirely, or the orientation for few letters only Use variable rotation, stretching, compressing
General H-CAPTCHA Generation Algorithm Input. • Original (randomly selected) handwritten image (existing US city name image or synthetic word image with length 5 to 8 characters or meaningful sentence) • Lexicon containing the image’s truth word Output. • H-CAPTCHA image Method. • Randomly choose a number of transformations • Randomly establish the transformations corresponding to the given number • If more than one transformation is chosen then • A priori order is assigned to each transformation based on experimental results • Sort the list of chosen transformations based on their priori order and apply them in sequence, so that the effect is cumulative
Testing Results on Machines The accuracy of HR on images deformed using Gestalt laws approach. The number of tested images is 4,127 for each type of transformation. HR running time increases from few seconds per image for lexicon 4,000 to several minutes per image for lexicon 40,000.
Testing Results on Humans The accuracy of human readers on images deformed using Gestalt laws approach. A word image is recognized correctly when all characters are recognized.
H-CAPTCHA Evaluation • No risk of image repetition • Image generation completely automated: words, images and distortions chosen at random • The transformed images cannot be easily normalized or rendered noise free by present computer programs, although original images must be public knowledge • Deformed images do not pose problems to humans • Human subjects succeeded on our test images • Test against state-of-the-art: Word Model Recognizer, Accuscript • CAPTCHAs unbroken by state-of-the-art recognizers
Future Work • Develop general methods to attack H-CAPTCHA (e.g. pre and post processing techniques) • Research lexicon free approaches for handwriting recognition • Quantify the gap between humans and machines in reading handwriting by category (of distortions & Gestalt laws) • Parameterize the difficulty levels of Gestalt based H-CAPTCHAs
Thank You Questions?