1 / 11

Security Analytics Thrust

Security Analytics Thrust. Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB) . Outline. Our view of Security Analytics Adversaries, Humans, and Machine Learning Joint research with McAfee Our proposed m alware analysis pipeline

lalo
Download Presentation

Security Analytics Thrust

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)

  2. Outline • Our view of Security Analytics • Adversaries, Humans, and Machine Learning • Joint research with McAfee • Our proposed malware analysis pipeline • Today’s Security Analytics talks

  3. Our View of Security Analytics • Using robust ML for adversary resistant security metrics and analytics • Pattern mining and prediction at scale on big data • Detecting malware, spam, and malicious sites/URLs • Identifying authors of User Generated Content and malware • Also, Sybil detection in crowds and obfuscating authors of UGC • Detecting human biosignals – EEG, vision tracking, SAFE continuous authentication • Helping the humans-in-the-loop (situational awareness) • End-users of systems • Crowds and human reviewers • Domain experts

  4. Adversarial Exploitation of ML • Traditional approach – Evading Adversary • Attacker determines decision boundary • Crafts (positive instance) content that is classified as negative • Newer approach – Influencing Adversary • Patient attacker operates during periodic retraining stage by injecting “tricky” positive instances • Shifts decision boundary over time during retraining such that (positive instance) content is eventually classified as negative • Need novel adaptive, robust ML techniques to defend against Influencing Adversaries

  5. Synergy between Humans and ML • Users – providing clear answers and usable security • Is this content spam or malicious? • What is the reasoning behind a security decision? • Can my UGC be identified as being mine? • Also, understanding how users reason about security • Crowds – augmenting ML with human capabilities • Leveraging humans to disambiguate borderline instances (e.g., is this a malicious or benign application or website) • Domain Experts – prioritizing a limited resource • Identifying when to rely on experts to evaluate model changes • Helping determine authorship identification for malware

  6. Collaboration with McAfee • Special academic-industry collaboration • Unique opportunity for academic access to massive scale real-world adversarial data • Pathway for research to yield real-world impact • Two Robust ML research efforts • Current: Active protection • Future: Malicious URL/site detection (Site Advisor) • Update: • Signed University-level NDAs with UC Berkeley and Drexel • Had meetings at Intel and UC Berkeley • Delivered prototype ML-based malware classification system that supports large-scale classification of polymorphic threats • Ongoing: Refining research focus and exploring Artemis sample dataset

  7. Artemis and GTI • Artemis and GTI collect voluminous “suspicious events and metadata” from millions of end host • McAfee needs to: • Classify events into clean/dirty label • Cluster events into groups • Rank groups according to their suspiciousness level • Help identify malware families (authorship classification) • Our planned efforts • Build a large-scale, online, adaptive ML system for automated malware classification with humans in the loop • Apply stylometryfor forensic analysis and malware classification

  8. Proposed Malware Analysis Pipeline Data from McAfee’s GTI and Google’s VirusTotal Program Features Program code Mobile Apps Executables Program Analysis Machine Learning Static/ Dynamic/ Human Analysis Malware Classification Models Program Features Feature Encoding Machine Learning Further analysis Feedback Human: Domain Experts Categorization and Prioritizationare critical!

  9. Security Analytics Talks (Session 1) • Big data for security analytics • Using adaptive, large-scale ML to identify and classify malware families using code features • Learning as an “attack”: De-anonymization • Automated analysis of encrypted traffic – Identifying the URLs/topics of SSL-encrypted web pages • Learning for web-based malware detection • Not code features, rather: Where scripts and objects comes from, Who makes the requests, How user gets to the site

  10. Security Analytics Talks (Session 2) • Using Network Science to detect Sybils in social networks • Leveraging social structure to detect fake accounts and improve user authentication • Learning as an “attack”: De-anonymization • Automated analysis and identification of underground forums users • Understanding how End Users reason about Risk • Security, privacy, and a 9-dimensional model for users

  11. Security Analytics Goals • Developing tools combining machine learning and analysis to automatically extract features and build models • Improving users’ experiences by translating the reasoning behind security decisions into human understandable concepts • Designing robust algorithms for large-scale machine-learning in the presence of adversarial manipulation

More Related