620 likes | 849 Views
“ A Low-cost Attack on a Microsoft CAPTCHA”. The annual ACM Computer and Communications Security Conference (2008). By Jeff Yan and Ahmad Salah El Ahmad. Presentation by Kathleen Stoeckle. Outline. Overview on CAPTCHA Related Work The MSN CAPTCHA Microsoft CAPTCHA Segmentation Attack
E N D
“A Low-cost Attack on a Microsoft CAPTCHA” The annual ACM Computer and Communications Security Conference (2008) By Jeff Yan and Ahmad Salah El Ahmad Presentation by Kathleen Stoeckle
Outline • Overview on CAPTCHA • Related Work • The MSN CAPTCHA • Microsoft CAPTCHA Segmentation Attack • Results & Analysis • Strengths and Weaknesses
CAPTCHA • Completely Automated Public Turing Test to tell Computers and Humans Apart • Primitive CAPTCHAs developed in 1997 by Andrei Broder, Martin Abadi, Krishna Bharat, and Mark Lillibridge. • Luis von Ahn and Michael Blum coined the term “CAPTCHA” and improved the method in 2000. http://www.searchenginepeople.com/blog/page/16http://www.searchenginepeople.com/blog/page/16
Google CAPTCHA http://www.codinghorror.com/blog/images/google-error-were-sorry-rate-limiter-captcha.png
Yahoo CAPTCHA http://www.carolisayazakuser.com/helppage/1aghostloginauthbox1a.gif
MSN CAPTCHA http://www.msn.com
Text-Based CAPTCHAs • Most widely used CAPTCHA • Distort text images in order to make them unrecognizable to pattern-recognition programs. • Popular because: • Task is intuitive (character recognition) • Few localization issues (roman characters easily recognized) • Strong security potential
Good CAPTCHAS • Robust • Human Friendly
Character Recognition • Computers excel at recognizing characters, even when the characters are distorted. • If the positions of characters are known in CAPTCHAs then breaking the scheme is a matter of pattern recognition. • When the positions are not known then the computer programs have difficulty solving them.
Recognition rate for individual characters under different distortions
Segmentation • Segmentation – Identifying characters in the right order. • Challenging for both handwriting recognition and computer vision. • Traditionally, both computationally expensive and difficult when taking into account all characters in the challenge.
“State of the Art” CAPTCHAs • Robustness of CAPTCHA must rely on segmentation rather than recognition. • If a text-based CAPTCHa is reduced to the challenge of recognizing individual characters then the scheme is effectively broken.
Purpose of Paper Yan and El Ahmed’s paper examines the security of the Microsoft CAPTCHA. • This scheme was designed by an interdisciplinary Microsoft team. The principle “segmentation resistance” was established by this team. • By attacking this CAPTCHA, the authors goal is to determine how the MSN scheme and similar CAPTCHAs can be improved. • The paper shows how the Microsoft CAPTCHA was broken with a desktop computer with a 1.86 GHz Intel Core 2 CPU and 2 GB RAM using their algorithms.
E-Z Gimpy and Gimpy Broken by Mori and Malik: • E-Z Gimpy (92% success) • Gimpy (33% success) • Object-recognition algorithms http://www.cs.sfu.ca/~mori/research/gimpy
E-Z Gimpy and Gimpy Broken by Moy et al: • E-Z Gimpy (99% success) • 4-letter Gimpy-r (78% success) • Used distortion elimination techniques http://www.cs.sfu.ca/~mori/research/gimpy
Other Work… • Chellapilla and Simard attacked visual CAPTCHAs (4.89% to 66.2% success) • Yan and El Ahmad defeated CAPTCHAs generated on Captchaservice.org (Almost 100% success). • Accomplished by counting pixels of segmented characters. • Examined robustness from security angle. • Simple pattern-recognition analysis. • PWNtcha – awebsite that demonstrates CAPTCHAs weakness and inefficiencies. Broke visual CAPTCHas (49 to 100% success)
The MSN CAPTCHA Challenge • Each challenge consists of 8 characters. • Only upper case letters and digits are used. • Text is dark blue and background is light • gray. • Warping is used to distort characters. • Random arcs of different thickness are • used in the anti-segmentation • measure.
Warping • Local • Small ripples, waves and elastic deformations along the pixels of the character. • Global • Character-level, elastic deformations to foil template matching algorithms.
Warping, cont’d Local Global
Random Arcs • Thick Foreground Arcs • Same color as characters • As thick as the characters • Non-intersecting • Thin Foreground Arcs • Same color as characters • As thick as the thinnest parts of characters • Intersecting • Thin Background Arcs • Thin • Same color as background • Cut through characters
Low-Cost Segmentation Attack On Microsoft CAPTCHA
Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.
Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.
Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.
Low-Cost Segmentation Attack • Goal: Segment Microsoft CAPTCHA challenges. • Identify and remove random arcs • Identify all character locations in the right order. • Accomplishes this by: • Dividing each challenge into 8 ordered segments.
Attack in 7 Steps • Binarization • Fixing Broken Characters • Vertical Segmentation • Color Filling Segmentation • Thick Arc Removal • Locating Connected Characters • Segment Connected Characters
Step 1: Binarization • Convert a color challenge to a two-color image using threshhold method. • High intensity White • Low intensity Black
Step 2: Fixing Broken Characters • Keep character as a single entity. • Prevent small portions of characters from being removed by an arc.
Step 2 • Find background color pixels that have left and right neighbors with foreground color • Find background color pixels that have top and bottom neighbors with foreground color. • Convert pixels identified above to foreground color.
Step 3: Vertical Segmentation Segmentation method – Divide challenge vertically into chunks. Divide and Conquer
Step 4: Color Filling Segmentation (CFS) • CFS applied to each chunk (Step 3) • Find every connected component or “object” in each chunk.
Step 4 • CFS Algorithm: • Detect foreground pixel and trace it to all connecting pixels. This creates an object. • Locate foreground pixel outside of the object and traces connecting pixels to identify the next object. • Process essentially amounts to color filling each object. The number of colors used = the number of objects.
Step 4 8 connectivity - Each pixel has 8 neighbors
Step 4 A color fill is applied to each chunk, regardless of number of objects in the chunk.
Step 5: Thick Arc Removal Thick Arc Characteristics: • Pixel Count – Generally small • Location – Close to or intersecting with image border. • Shape – Thick arcs do not contain circles. No characters such as A, B, D, P, Q, 4, 6, 8 and 9. • Interplay between shape and location – Correlation between thick arcs and geometric shape: • Tall but Narrow near start of CAPTCHA • Wide and Short near middle
Step 5 Thick Arc Removal Algorithm • Circle Detection • Scan objects without circles for distinct features. • Relative Position Checking • Detection of Remaining Arcs
Step 5-1: Circle Detection • Draws bounding box around an object. • Fill box with a color that is different from foreground and background. • Scan for pixels with background color. If found, a circle has been detected.
Step 5-2: Scan non-circle objects for distinctive features Pixel Checking • Characters generally have pixel count of over 50. • Any characters 50 pixels or less is removed as an arc.
Step 5-3: Relative Position Checking This step is applied to all chunks with more than one object. Premise: The positions of objects determines the difference between arcs and characters. Characters are always closer to the baseline. Characters are horizontally juxtaposed, but never vertically.
Step 5-4: Detection of Remaining Arcs • Count the number of remaining arcs in the image. • Remaining arcs are generally the first and last object in the current image. • Check first and last objects using these rules: 1) If one object contains circle, the other is removed. 2) If neither object contains a circle, the one with the fewer number of pixels is removed.
Step 6: Locating Connected Characters n = number of objects in an image If n< 8, at least one object has two or more connected characters. MSN Challenge 1. 8 characters in an image 2. Connected characters are connected horizontally not vertically and thus are wider. 3. A segmented chunk contains more than one character if the chunk is wider than 35 pixels. The number of chunks, width of chunks, and number of objects in a chunk are used to guess which chunks contain connected characters.
Step 7: Segment Connected Characters • Find the width of an object by determining its left and rightmost pixels. • Vertically divide object into c parts of same width where c = number of characters.
Results of 7 Step Attack Success Rate: 91% (91 out of 100 challenges) 92% of 500 random challenges Attack Speed: Implemented in java 1.86 Ghz Intel Core CPU and 2 GB Ram