240 likes | 326 Views
PhishDef : URL Names Say It All. Michalis Faloutsos U niversity of California, Riverside USA. Anh Le, Athina Markopoulou U niversity of California, Irvine USA. What is Phishing?. Social engineering and technical means to steal consumers’ personal identity, data, etc.
E N D
PhishDef: URL Names Say It All MichalisFaloutsos University of California, Riverside USA Anh Le, AthinaMarkopoulou University of California, Irvine USA
What is Phishing? • Social engineering and technical means to steal consumers’ personal identity, data, etc. • Cause billions of dollars of loss annually Anh Le - UC Irvine - PhishDef
Antiphishing.org Anh Le - UC Irvine - PhishDef
Example of a Phishing Site Anh Le - UC Irvine - PhishDef
Current Protection • Google Safe Browsing • Microsoft Smart Screen • Third-Party Anh Le - UC Irvine - PhishDef
Current Protection Model Google Safe Browsing • Motivation: • Blacklist-based protection is reactive -- -- cannot protect against zero-day phishing Anh Le - UC Irvine - PhishDef
Outline Phishing Background Motivation Our proposal New Protection Model Learning Algorithms Dataset Feature Selection Evaluation Results Concluding Remarks Anh Le - UC Irvine - PhishDef
Our Proposed Protection Model • Main challenges: Accuracy and Classification Latency • Which classification algorithm works best? • Which set of features works best? Anh Le - UC Irvine - PhishDef
Prior Work Whittaker et al. [NDSS ’10] Google Safe Browsing Ma et al. [SIGKDD ’09] Batch-based Classification Ma et al. [ICML ‘09] Batch-based vs. Online Learning Server-Side Classification Anh Le - UC Irvine - PhishDef
Main Contributions New Protection Model: Client-side classification Propose using Adaptive Regularization of Weights (AROW) High accuracy Resilient to noise Set of Lexical Features Fast to extract at client side Obfuscation resistant Anh Le - UC Irvine - PhishDef
Machine Learning Algorithms • Batch-based Support Vector Machine • Online Perceptron • Confident Weighted (CW) [Dredze et al., ICML 2008] • Adaptive Regularization of Weights (AROW)[Crammer et al., NIPS 2009] Anh Le - UC Irvine - PhishDef
Online Classification • Maintaining a weight vector and use it for classification • Online Perceptron Client Side: Trained Beforehand Extract In Real Time Server Side: Anh Le - UC Irvine - PhishDef
Online Classification • Confident Weighted (CW) • Adaptive Regularization of Weights (AROW) minimum change enough to correct last mistake minimum change increasing confidence penalty for mistake Anh Le - UC Irvine - PhishDef
Dataset • Phishing URLs • PhishTank (4,082) • MalwarePatrol (2,001) • Benign URLs • Open directory(4,012) • Yahoo directory (4,143) • Time period: June 2010 Anh Le - UC Irvine - PhishDef
Feature Selection • Lexical Features • External Features • Country, AS number, registration date, registrant, registrar, etc. Anh Le - UC Irvine - PhishDef
Outline Phishing Background Motivation Our proposal New Protection Model Learning Algorithms Dataset Feature Selection Evaluation Results Concluding Remarks Anh Le - UC Irvine - PhishDef
Evaluation Results: Lexical vs. Full Features • (+) ~ 1% • (-) Dependency on Remote Server • (-) Avg. Latency: 1.64 s Lexical features alone are better-suited than full features for client-side phishing classification Anh Le - UC Irvine - PhishDef
Evaluation Results:CW vs. AROW AROW is more resilient to noise than CW Anh Le - UC Irvine - PhishDef
Conclusion: PhishDef • Client-side phishing classification system • Proactive, on-the-fly classification of zero-day phishing URLs • Low delay client side (ms),high accuracy (97%) • Resilient to noisy data • Future Work: • Develop an add-on for Firefox Anh Le - UC Irvine - PhishDef
Questions Anh Le - UC Irvine - PhishDef
Example of a Phishing Site http://pilety.ru/c548c205d7660ed0628b467d7d5aa54c9c3a7124/image/taxrefund.htm http://www.hmrc.gov.uk/intro-income-tax.htm Anh Le - UC Irvine - PhishDef
Evaluation Results:Batch-Based vs. Online Learning Online Learning outperforms Batched-Based Learningfor Phishing classification Anh Le - UC Irvine - PhishDef
Chrome 11 > Firefox 4 Anh Le - UC Irvine - PhishDef