Phinding Phish: An Evaluation of Anti-Phishing Toolbars

Phinding Phish: An Evaluation of Anti-Phishing Toolbars Yue Zhang, Serge Egelman, Lorrie Cranor, and Jason Hong

Anti-Phishing Tools • 84 Listed on download.com (Sept. ‘06) • Included in many browsers • Poor usability • Many users don’t see indicators • Many choose to ignore them • But usability is being addressed • Are they accurate?

Tools Tested • CallingID • Cloudmark • EarthLink

Tools Tested • eBay • Firefox

Tools Tested • IE7

Tools Tested • Netcraft • Netscape

Tools Tested • SpoofGuard • TrustWatch

Source of Phish • High volume of fresh phish • Sites taken down after a day on average • Fresh phish yield blacklist update information • Can’t use toolbar blacklists • We experimented with several sources • APWG - high volume but many duplicates and legitimate URLs included • Phishtank.org - lower volume but easier to extract phish • Assorted other phish archives - often low volume or not fresh enough

Phishing Feeds • Anti-Phishing Working Group • reportphishing@antiphishing.org • ISPs, individuals, etc. • >2,000 messages/day • Filtering out URLs from messages • PhishTank • http://www.phishtank.org/ • Submitted by public • ~48 messages/day • Manually verify URLs

Testbed for Anti-Phishing Toolbars • Automated testing • Aggregate performance statistics • Key design issue: • Different browsers • Different toolbars • Different indicator types • Solution: Image analysis • Compare screenshots with known states

Two examples: TrustWatch and Google TrustWatch: Google: Not verified Warning!! Verified Phish!! Image-Based Comparisons ScreenShot ScreenShot

Testbed System Architecture

Testbed System Architecture Retrieve Potential Phishing Sites

Testbed System Architecture Send URL to Workers

Testbed System Architecture Worker Evaluates Potential Phishing Site

Testbed System Architecture Task Manager Aggregates Results

Experiment Methodology • Catch Rate: Given a set of phishing URLs, what percentage of them are correctly labeled as phish by the tool - count block and warning only - taken down sites removed • False Positives: Given a set of legitimate URLs, what percentage of them are incorrectly labeled as phish by the tool - count block and warning only - taken down sites removed

Experiment 1 • PhishTank feed used • Equipment: • 1 Notebook as Task Manager • 2 Notebooks as Workers • 10 Tools Examined: • CloudMark • Earthlink • eBay • IE7 • Google/Firefox • McAfee • Netcraft • Netscape • SpoofGuard • TrustWatch

Experiment 1 • 100 phishing URLs • PhishTank feed • Manually verified • Re-examined at 1, 2, 12, 24 hour intervals • Examined blacklist update rate (except w/SpoofGuard) • Examined take-down rate • 514 legitimate URLs • 416 from 3Sharp report • 35 from bank log-in pages • 35 from top pages by Alexa • 30 random pages

Experiment 2 • APWG phishing feed • 9 of the same toolbars tested + CallingID • Same testing environment

Results of Experiment 1

Results of Experiment 2

False Positives Not a big problem for most of the toolbars

Overall findings • No toolbar caught 100% • Good performers: • SpoofGuard (>90%) • Though 42% false positives • IE7 (70%-80%) • Netcraft (60%-80%) • Firefox (50%-80%) • Most performed poorly: • Netscape (10%-30%) • CallingID (20%-40%)

More findings • Performance varied with feed • Better with Phishtank: • Cloudmark, Earthlink, Firefox, Netcraft • Better with APWG: • eBay, IE7, Netscape • Almost the same: • Spoofguard, Trustwatch • Different increases over time • More increases on APWG • Reflects the “freshness” of URLs

CDN Attack • Many tools use blacklists • Many examine IP addresses (location, etc.) • Proxies distort URLs • Used Coral CDN • Append .nyud.net:8090 to URLs • Uses PlanetLab • Works on: • Cloudmark • Google • TrustWatch • Netcraft • Netscape

Page Load Attack • Some wait for page to be fully loaded • SpoofGuard • eBay • Insert a web bug taking infinite load time • 5 lines of PHP • 1x1 GIF • Infinite loop spitting out data very slowly • Tool stays in previous state • Unable to indicate anything

Conclusion • Tool Performance • No toolbars are perfect • No single toolbar will outperform others • Heuristics have false positives • Whitelists? • Hybrid approach? • Testing Methodology • Get fresher URLs • Test other than default settings • User interfaces • Usability is important • Traffic light? • Pop up message? • Re-direct page?

CMUUsablePrivacy andSecurity Laboratory http://cups.cs.cmu.edu/

Phinding Phish: An Evaluation of Anti-Phishing Toolbars