190 likes | 484 Views
563.10.3 CAPTCHA. Presented by: Sari Louis SPAM Group: Marc Gagnon, Sari Louis, Steve White University of Illinois Spring 2006. Agenda. Definition Background Applications Types of CAPTCHAs Breaking CAPTCHAs Proposed Approach Conclusion. Definition.
E N D
563.10.3 CAPTCHA Presented by: Sari Louis SPAM Group: Marc Gagnon, Sari Louis, Steve White University of Illinois Spring 2006
Agenda • Definition • Background • Applications • Types of CAPTCHAs • Breaking CAPTCHAs • Proposed Approach • Conclusion
Definition • CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart • A.K.A. Reverse Turing Test, Human Interaction Proof • The challenge: develop a software program that can create and grade challenges most humans can pass but computers cannot
Background • First used by Altavista in1997 • Reduced SPAM add-url by over 95% • CMU/Yahoo! • Automated the creating and grading of challenges • PARC • Relies on document image degradation to prevent successful OCR • Conducted user-focused studies to assess the effectiveness of CAPTCHAs
Background • CAPTCHAs are based on open AI problems • Breaking CAPTCHAs help advance AI by solving these open problems • Improving CAPTCHAs help telling computers and human apart • Win-win situation
Background - Papers • Pessimal Print: A Reverse Turing TestAllison L. Coates, Henry S. Baird, Richard J. Fateman • Telling Humans and Computer Apart AutomaticallyLuis von Ahn, Manuel Blum, and John Langford • CAPTCHA: Using Hard AI Problems for SecurityLuis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford • Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)Kumar Chellapilla, Patrice Y. Simard
Applications • Free email services • Online polls • Dictionary attacks • Newsgroups, Blogs, etc… • SPAM
Types of CAPTCHAs • Text based • Gimpy, ez-gimpy • Gimpy-r, Google CAPTCHA • Simard’s HIP (MSN) • Graphic based • Bongo • Pix • Audio based
Text Based CAPTCHAs • Gimpy, ez-gimpy • Pick a word or words from a small dictionary • Distort them and add noise and background • Gimpy-r, Google’s CAPTCHA • Pick random letters • Distort them, add noise and background • Simard’s HIP • Pick random letters and numbers • Distort them and add arcs
Graphic Based CAPTCHAs • Bongo • Display two series of blocks • User must find the characteristic that sets the two series apart • User is asked to determine which series each of four single blocks belongs to Difference? thick vs. thin lines
Graphic Based CAPTCHAs • PIX • Create a large database of labeled images • Pick a concrete object • Pick four images of the object from the images database • Distort the images • Ask the user to pick the object for a list of words
Graphic Based CAPTCHAs Pool Dog
Audio Based CAPTCHAs • Pick a word or a sequence of numbers at random • Render them into an audio clip using a TTS software • Distort the audio clip • Ask the user to identify and type the word or numbers
Breaking CAPTCHAs • Most text based CAPTCHAs have been broken by software • OCR • Segmentation • Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve.
Proposed Approach • Very similar to PIX • Pick a concrete object • Get 6 images at random from images.google.com that match the object • Distort the images • Build a list of 100 words: 90 from a full dictionary, 10 from the objects dictionary • Prompt the user to pick the object from the list of words
Proposed Approach - Technical • Make an HTTP call to images.google.com and search for the object • Screen scrape the result of 2-3 pages to get the list of images • Pick 6 images at random • Randomly distort both the images and their URLs before displaying them • Expire the CAPTCHA in 30-45 seconds
Proposed Approach - Benefits • The database already exists and is public • The database is constantly being updated and maintained • Adding “concrete objects” to the dictionary is virtually instantaneous • Distortion prevents caching hacks • Quick expiration limits streaming hacks
Proposed Approach - Drawbacks • Not accessible to people with disabilities (which is the case of most CAPTCHAs) • Relies on Google’s infrastructure • Unlike CAPTCHAs using random letters and numbers, the number of challenge words is limited