320 likes | 588 Views
RE: Captchas – Understanding Captcha - Solving services in an economic context. Jade Slayter , Chris Hare, Jay Kim. What is a captcha ?. Captcha stands for Computer Automated Public Turing Test to Tell Computers and Humans apart Created to differentiate between bots and humans Must be
E N D
RE: Captchas – Understanding Captcha- Solving services in an economic context Jade Slayter, Chris Hare, Jay Kim
What is a captcha? • Captcha stands for Computer Automated Public Turing Test to Tell Computers and Humans apart • Created to differentiate between bots and humans • Must be • Solved Easily by humans • Evaluated and generated easily • Hard to be solved by computers
History of the Captcha • The term Was first coined in 2000 • Developed as a way to limit attacks through automated means • Attackers evolve to easily solve captchas • Scripted solvers and paid human solving services • Captchas are one of the few examples where the defenders have the advantage
Automated software solvers • Advantage • Near zero marginal cost • Near infinite capacity • Uses optical character recognition to identify texts • Writing the algorithms is complex • Algorithms often fail • Changed the captchainto a question of commercial viability
Xrumer • Forum spamming tool that solves Captchas • Released 2006 for $540 • Led to an arms race of captcha complexity and algorithm accuracy • Popular forum solutions have adapted so the latest Xrumer cannot crack the captchas • Simple Machine Forums still crackable • In 2009 XRUMER added human solving integration
RecaptchaOCR • Created to solve the specific recaptcha service • Designed for 2008 version of recaptcha • Recaptcha updated in 2009, solver not updated • Tests ran on old and new implementation • ~30% solving accuracy for 2008 version • ~18% solving accuracy for 2009 version • Human accuracy between 75-90%
Economics – Automated • Arms race between solvers and website defenses • Automated solving broken down to • Cost of development • Accuracy • Responsiveness of defending sites • Humans solvers became a cheaper solution
Human Solving Services • Opportunistic Solving • Tricking users to solve CAPTCHAs • Does not play a major role • Paid services • Earliest Symantec blog in September 2006 • “Full time Captcha solver” • Approximately $10/1000
Human solving services cont… (Workflow) CAPTCHA SOLVING WORKFLOW
Human solving services cont… (Paid services - freelance) • After the Symantec blog post, retail solvers became prevalent. • Wages (for freelancers) • In 2007 as high as $10/1000 solved • 2008 went down to $1.50/1000 solved • 2009 $1/1000 solved • 2010 was as low as $0.50/1000
Human solving services cont… (Paid services - retail) • Because freelance work became cheap, retail services were pressured to lower prices • Typethat.biz went from $1/1000 to $0.75/1000 • Some tried to tie services with a product (see picture) • This allowed for wages as high as $7/1000 (Bypass captcha and beatCaptcha) and $20/1000 (image2type) • However others (like GYC) tried to use plugins to reduce wages • According to Mr. E: 50% was profit, 10% to maintenance, the rest split between workers and incentives to partners
Customer Account Creation • Very exclusive • Uses “invite” codes • Live telephone call • “Region-locked” CAPTCHA • Prepayment
The Interface • Generally (though not always) Have an API • ImagetoText requires users to test api first • Others don’t have an api • Authors wrote one using ruby
Pricing • Some places (Antigate and de-captcher) uses bidding system for highest priority • Generally highest bid is about $1/1000 • No (observable) price change in worker side • Implied pure profit for service provider
How well do they work? Accuracy • Human solvers • Of 1,025 CAPTCHAs, 1009 were solved correctly • Seven were unreadable • 6 had ambiguous characters • 3 were ambiguous because of overlap • Captcha-solving services • 86%-89% solved (for most programs)
How well Do they work? Accuracy cont… • Correct solutions dependent on services • Paypal more accurate then youku • Possibly due to being less familiar
How well do they work? Response Time • On average, took about 15-20 seconds. • Response time usually beat the Internet timeout time • Fastest service took just over 9 seconds. Slowest took around a minute
Capacity • Antigate was by far the best • Can solve between 27 – 41 captcha’s per second (maybe even more) • Others were able to be maxed out • Can solve anywhere between 4 per second to 15 per second • Response time and accuracy were dependent on Pacific Time
Measurement Issues • Legality and ethics questioned • Legal, but related cases • Murky ethics, deemed worth the cost • Purchase supported CAPTCHA solving devs • Services not used on intended target sites • Human solved CAPTCHAs done via copies
Captcha Solving Site Comparison • Similar registration • Captcha test • Simple interface given to solve new captchas • Accuracy recorded, accounts banned if inaccurate • Priority given to accurate solvers • Wages vary from 0.5-10 dollars per thousand • Leaderboards per month to encourage competition
Captcha Solving Site Comparison (2) • Captchas with varying questions and answers used to guess countries based on accuracy • Translating numbers to roman numerals • Time zone question • 10% gave clear answer • China and India mostly
Changes in trends • Sites update their captchaimages • Identifying animals and objects • Solving services start off with poor success and rapidly improve
Targeted Sites • Global sites a big target for solving services • Local sites big for services specializing in a langauge • Top five of each site covers majority of the traffic
Discussion and Conclusion • CAPTCHAs are • Simple and easy to solve by humans • “low-impact” quality appeals to sites wary of defense turning away visitors • Easily outsourced to the global unskilled labor market • Do CAPTCHAs actually work?
Discussion and Conclusion cont. • Telling computers and humans apart • Preventing the automated site access • Limiting automated site access • The role of CAPTCHAs today
Telling Computers and Humans Apart • Original purpose of CAPTCHAs was to distinguish humans from machines • To date, no completely general means of solving CAPTCHAs has emerged, nor is creating automated solvers viable as a business model • CAPTCHAs have succeed thus far in this regard
Preventing Automated Site Access • Today, retail price for solving one million CAPTCHAs is as low as $1,000 • CAPTCHAs are an acceptable cost of doing business when measured against the value of gaining access to the protected resource • E-mail spammers using Web mail to send advertisements • Blog spammers seek organic “clicks” and influence result placement of major search engines • CAPTCHAs do not prevent large-scale automated site access.
Limiting Automated Site Access • A CAPTCHA reduces an attacker’s expected profit by the cost of solving the CAPTCHA • If the attacker cannot afford this extra cost, the defense mechanism is successful • For many sites (e.g., low PageRank blogs), CAPTCHAs alone might be sufficient to dissuade abuse
Limiting Automated Site Access cont… • For higher-value sites, they place a utilization constraint on otherwise “free” resources • CAPTCHAs naturally limit site access to those attackers still profitable despite these costs
The Role of CAPTCHAs Today • The profitability of any scam is a function of three factors • The cost of the CAPTCHA-solving • The effectiveness of any secondary defenses (e.g., SMS validation, account shutdowns, additional CAPTCHA screens, etc.) • The efficiency of the attacker’s business model • As the cost of CAPTCHA solving decreases, a site operator must employ secondary defenses more aggressively to maintain a given level of fraud.
The Role of CAPTCHAs Today cont. • Secondary defenses are invariably more expensive both in infrastructure and customer impact when compared to CAPTCHAs • The optimal point for this transition is precisely the point at which the attacker “breaks even” • CAPTCHAs while traditionally viewed as a technological impediment, should be considered more as an economic one • CAPTCHAs continue to put strain on attacker’s business models while minimizing cost and user impact of secondary defenses, but simply work less efficiently over time