1 / 29

Bayesian Filtering Anti-Phishing Toolbar Benefits

Bayesian Filtering Anti-Phishing Toolbar Benefits. P. Likarish, E. Jung, D. Dunbar, T. E. Hansen, and J.-P. Hourcade 12/04/07 presented by EJ Jung. Phishing. Why study phishing?. Identity Theft * One of fastest growing crimes ~15 million Americans/year , $2.8 billion dollars.

selia
Download Presentation

Bayesian Filtering Anti-Phishing Toolbar Benefits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Filtering Anti-Phishing Toolbar Benefits P. Likarish, E. Jung, D. Dunbar, T. E. Hansen, and J.-P. Hourcade 12/04/07 presented by EJ Jung

  2. Phishing

  3. Why study phishing? • Identity Theft* • One of fastest growing crimes • ~15 million Americans/year, $2.8 billion dollars *Gartner, Inc. 2007 press release. http://www.gartner.com/it/page.jsp?id=501912, March 2007 **Phishing report. http://apwg.org

  4. Phishing leads into malware **Phishing report. Trojans and keyloggers. http://apwg.org

  5. Phishing and botnet into black market (Franklin et al, 2007) • 6 months of IRC log

  6. … and into national security threat • FBI director Robert Muller says: • Younis Tsouli, and his colleagues stole thousands of credit card accounts through phishingschemes. They ran up charges of more than $3 million for items they thought fellowextremists might need, from night vision goggles to GPS devices. • botnet is Swiss Army Knifes of hackers

  7. Phishing attack

  8. Anti-Phishing Tools • Client or server side? • server side protection is limited • server-client cooperation • hash of system • Clientside is more common • web browser toolbar • password management

  9. Early Efforts • Largely heuristics-based • Set of rules developed by experts • Still used by most anti-phishing tools • Examples: • IE7 phishing filter • SpoofGuard

  10. SpoofGuard* • IE6 toolbar • Developed by Chou, Ledesma, Teraguchi, Boneh, Mitchell at Stanford • Heuristics+whitelist *N. Chou, R. Ledesma, Y. Teraguchi, D. Boneh, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS '04: Proceedings of the 11th Annual Network and Distributed System Security Symposium, February 2004

  11. Stateless Heuristics • URL check • Suspicious URLs: @, IP, hex • Image check • Hashed image database • Image hashing • Produces same hash for similar images • Link check • Fails if >¼ of links fail URL check • Password check

  12. Stateful Heuristics • Domain check • Hamming distance to known domains • Referrals • From email site? • May require DNS lookup • Image-domain association • Extension of hashed image heuristic • <image, URL> tuples

  13. Scoring TSS = Total Spoof Score 0 Ex: P1= URL check (0 if page passes, 1 if it fails) w1 = .2 Source: N. Chou, R. Ledesma, Y. Teraguchi, D. Boneh, and J. C. Mitchell. Client-side defense against web-based identity theft. In NDSS '04: Proceedings of the 11th Annual Network and Distributed System Security Symposium, February 2004

  14. Drawbacks to Heuristics • Difficult to develop accurate rules* • Large number of false positives and negatives** • Heuristics don’t evolve—phishing sites do. *M. Sahami, S. Dumais, D. Heckerman, and E. Horvitz. A Bayesian approach to filtering junk e-mail. In AAAI Workshop on Learning for Text Categorization, July 1998. **Y. Zhang, J. I. Hong, and L. F. C Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA: a content-based approach to detecting phishing web sites. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 639–648, New York, NY, USA, 2007. ACM Press.

  15. Next: Blacklist/Whitelist • ~2004-current • Largely blacklist-based • rely on phishing site reports • still used by most anti-phishing tools • Examples: • IE7 phishing filter • Firefox 2 phishing protection & Google safe-browsing • Netcraft* Toolbar *Netcraft Ltd. http://toolbar.netcraft.com

  16. Drawbacks to Blacklist/Whitelist • Need reliable and timely sources for reports • Window of vulnerability • after site launch before being blacklisted • avg lifetime of a phishing site: 3 days • avg lifetime after blacklisted: 22 hours • cost of undoing identity theft: priceless adapt classification methods -CANTINA, B-APT *Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA: a content-based approach to detecting phishing web sites. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 639–648, New York, NY, USA, 2007. ACM Press.

  17. CANTINA* • Technique • TF-IDF + Robust Hyperlinks • Domain name • Heuristics • *Y. Zhang, J. I. Hong, and L. F. Cranor. CANTINA: a content-based approach to detecting phishing web sites. In WWW '07: Proceedings of the 16th international conference on World Wide Web, pages 639–648, New York, NY, USA, 2007. ACM Press.

  18. TF-IDF • Text classification technique • Information retrieval • Term Frequency-Inverse Document Frequency • Importance of a word in a document in a given corpus • Document = website • Corpus = English language

  19. Robust Hyperlinks • Phelps and Wilensky • TF-IDF on all words on page • Lexical signature • 5 words with highest TF-IDF scores • Almost uniquely id 1,000,000,000 pages…

  20. TF-IDF + Hyperlinks in CANTINA • Calculate lexical signature • Google search on signature • If domain name is within top 30 hits, site is legitimate • Otherwise, it is phishing • Results: • 94% true positives : 30% false positives

  21. Improving on TF-IDF • Add domain name to Google search • 97% • 30% • TF-IDF + Zero results-Means-Phishing + domain name • 97% t.p. : 10% f.p.  67% t.p.  10% f.p.

  22. Adding heuristics to CANTINA • Heuristics from SpoofGuard and other sources • Trade-off • Reduces true positive accuracy • 97%  89% t.p. • Reduces false positive rate • 10%  1% f.p.

  23. Drawbacks to CANTINA • Relies on outside sources for information • Google • Requires heuristics to reduce false positives • Reduces accuracy… • Language-specific • Different corpus for each foreign language • Difficulties with East Asian languages • Unacceptable false positive rate • Misclassifications undermine user confidence in tool

  24. B-APT: Bayesian Anti-Phishing toolbar • Firefox browser toolbar • will extend to other browsers • goals: detect, communicate, and educate • Bayesian filtering + whitelist • similar to spam filtering • different from spam filtering • phishing sites mirror legitimate sites • hard to find training set (inbox vs. blacklist database) • comprehensive whitelist • Innovative UI • no known effective security indicators for warning user of phishing sites (Dhamija, 2006; Wu, 2007)

  25. Bayesian classification • Bayes’ law on conditional probability • Pros • easy to compute • training and tayloring • Cons • assume independence among words • Bayesian poisoning

  26. Implementation details • Training on phishing pages and legitimate pages • Phishtrack: HTML of phishing pages* • 1200+ phishing sites = 160+ unique sites • Alexa top 500: most popular websites** • same KBs of phishing sites (17k vs 64k tokens) *http://www.dslreports.com/phishtrack **http://www.alexa.com/

  27. B-APT detecting phishing sites Anti-phishing tool’s tested on 60 phishing sites

  28. B-APT detecting legitimate sites Anti-phishing tool’s tested on 60 legitimate sites

  29. Summary • Classification + heuristics do well • B-APT has no false negative, some false positive • working on communicating false positives • detect, communicate, and educate • Use of any toolbar is better than none • the least number was 42% of IE7 • blacklist-based ones get better as time passes (Zhang, 2007) • Beware of malware • Badware.org with Google

More Related