360 likes | 499 Views
20 th Annual Network & Distributed System Security Symposium (NDSS 2013). CAMP: Content-Agnostic Malware Protection. Niels Provos , Moheeb Abu Rajab, Lucas Ballard, Noe Lutz and Panayiotis Mavrommatis Google Inc. 左昌國 2013/04/01 Seminar @ ADLab , NCU-CSIE . X-agnostic
E N D
20th Annual Network & Distributed System Security Symposium (NDSS 2013) CAMP: Content-Agnostic Malware Protection NielsProvos, Moheeb Abu Rajab, Lucas Ballard, NoeLutz and Panayiotis Mavrommatis Google Inc. 左昌國 2013/04/01 Seminar @ ADLab, NCU-CSIE
X-agnostic • Without the knowledge of X • Content-agnostic malware protection • The protection operates without the knowledge of the malware content
Outline • Introduction • Related Work • System Architecture • Reputation System • Evaluation • Conclusion
Introduction • Malware distribution through web browsers • Drive-by Downloads • I will not talk about it in this paper • Social Engineering • Fake Anti-Virus • The defense? • Blacklists / Whitelists • Signature-based solution • CAMP • Reputation system • Low false positive
Related Work • Content-based Detection • Anti-virus software • CloudAV • Blacklist-based Protection • Google Safe Browsing API • McAfee Site Advisor • Symantec Safe Web • Whitelist-based Schemes • Bit9 • CoreTrace • Reputation-based Detection • SNARE • Notos and EXPOSURE • Microsoft SmartScreen
System Architecture Client Server
System Architecture – Binary Analysis • Producing labels (benign or malicious) for training purpose • To classify binaries based on static and dynamic analysis • The labels are also used to decide thresholds • Goal: low false positive
System Architecture – Client • Doing local checks before asking the server for decision • In blacklists? Google Safe Browsing API • Potentially harmful? e.g. DMG files in Mac OS X • In whitelists? Trusted domains and trusted signing certificates • If no results in the local decision • Extracting features from the downloaded binary • Final download URL / IP address • Referrer URL / (corresponding) IP address • Size / hash • Signature • Sending the features to the server
System Architecture – Client • The returned decision
System Architecture – Client • ~70% of all downloads are considered benign due to policy or matching client-side whitelists • (on server side) Regularly analyzing binaries hosted on the trusted domains or signed by trusted signers
System Architecture – Server • The server receives the client request and renders a reputation verdict • The server uses the information to update its reputation data • BigTable and MapReduce
System Architecture – Frontend and Data Storage • Frontend • RPC to reputation system • URL as index? • Popular URLs timestamp(request to the URL) : Reverse-Ordered hexadecimal string
System Architecture – Spam Filtering • Velocity controls on the user IP address • The spam filter is employed to fetch binaries from the web that have not been analyzed by the binary classifier • Filter: only binaries that exhibit sufficient diversity of context • The analysis may complete a long time after a reputation decision was made
System Architecture – Aggregator • Aggregate • Forming the reputation data • 3-dimensional index • From where • Features • Categories: reputation / urls / hash • client | site:foo.com | reputation (6, 10) • analysis | ip:1.2.3.4/24 | urls (0, 3) • Value • (a, b) • a: the number of interesting observations • b: the total number of observations
Reputation System • Feature Extraction • IP address: single or netblock • URL: direct download or host/domain/site • Sign/Hash
Reputation System – Decision • Threshold • Thresholds are chosen according to the precision and recall for each AND gate • Precision and recall are determined from a labeled training set • Training set: matching (hash from requests) with (hash from binary analysis) • Binary analysis provides the label (benign or malicious) • Request provides the features • 4000 benign requests / 1000 malicious requests • Precision and recall • http://en.wikipedia.org/wiki/Precision_and_recall
Evaluation • Google Chrome • Targeting Windows executables • Accuracy of Binary Analysis • Compared against VirusTotal • 2200 samples selected • 1100 were labeled clean by binary analysis component • 1100 were labeled malicious • Submitting to VirusTotal and waiting for 10 days • 99% of the malicious labeled binaries were flagged by 20%+ of AV engines on VirusTotal • 12% of the clean labeled binaries were flagged 20%+ of AV engines on VirusTotal
Evaluation – Accuracy of CAMP • Feb. 2012 ~ July 2012 • Total 200 million users • Each day, 8~10 million request • 200~300 thousand labeled as malicious • Total 3.2 billion aggregates • , , , • Overall accuracy
Evaluation – Comparison to other systems • A random sample of 10,000 binaries labeled as benign • 8,400 binaries labeled as malicious
Conclusion • This paper presents a content-agnostic malware protection system, CAMP • This paper performed a large scale of evaluation, and show that the detection approach is both accurate and good performance(processing requests in less 130ms)