Security Analytics Thrust

Security Analytics Thrust Anthony D. Joseph (UCB) Rachel Greenstadt (Drexel), Ling Huang (Intel), Dawn Song (UCB), Doug Tygar (UCB)

Outline • Our view of Security Analytics • Adversaries, Humans, and Machine Learning • Joint research with McAfee • Our proposed malware analysis pipeline • Today’s Security Analytics talks

Our View of Security Analytics • Using robust ML for adversary resistant security metrics and analytics • Pattern mining and prediction at scale on big data • Detecting malware, spam, and malicious sites/URLs • Identifying authors of User Generated Content and malware • Also, Sybil detection in crowds and obfuscating authors of UGC • Detecting human biosignals – EEG, vision tracking, SAFE continuous authentication • Helping the humans-in-the-loop (situational awareness) • End-users of systems • Crowds and human reviewers • Domain experts

Adversarial Exploitation of ML • Traditional approach – Evading Adversary • Attacker determines decision boundary • Crafts (positive instance) content that is classified as negative • Newer approach – Influencing Adversary • Patient attacker operates during periodic retraining stage by injecting “tricky” positive instances • Shifts decision boundary over time during retraining such that (positive instance) content is eventually classified as negative • Need novel adaptive, robust ML techniques to defend against Influencing Adversaries

Synergy between Humans and ML • Users – providing clear answers and usable security • Is this content spam or malicious? • What is the reasoning behind a security decision? • Can my UGC be identified as being mine? • Also, understanding how users reason about security • Crowds – augmenting ML with human capabilities • Leveraging humans to disambiguate borderline instances (e.g., is this a malicious or benign application or website) • Domain Experts – prioritizing a limited resource • Identifying when to rely on experts to evaluate model changes • Helping determine authorship identification for malware

Collaboration with McAfee • Special academic-industry collaboration • Unique opportunity for academic access to massive scale real-world adversarial data • Pathway for research to yield real-world impact • Two Robust ML research efforts • Current: Active protection • Future: Malicious URL/site detection (Site Advisor) • Update: • Signed University-level NDAs with UC Berkeley and Drexel • Had meetings at Intel and UC Berkeley • Delivered prototype ML-based malware classification system that supports large-scale classification of polymorphic threats • Ongoing: Refining research focus and exploring Artemis sample dataset

Artemis and GTI • Artemis and GTI collect voluminous “suspicious events and metadata” from millions of end host • McAfee needs to: • Classify events into clean/dirty label • Cluster events into groups • Rank groups according to their suspiciousness level • Help identify malware families (authorship classification) • Our planned efforts • Build a large-scale, online, adaptive ML system for automated malware classification with humans in the loop • Apply stylometryfor forensic analysis and malware classification

Proposed Malware Analysis Pipeline Data from McAfee’s GTI and Google’s VirusTotal Program Features Program code Mobile Apps Executables Program Analysis Machine Learning Static/ Dynamic/ Human Analysis Malware Classification Models Program Features Feature Encoding Machine Learning Further analysis Feedback Human: Domain Experts Categorization and Prioritizationare critical!

Security Analytics Talks (Session 1) • Big data for security analytics • Using adaptive, large-scale ML to identify and classify malware families using code features • Learning as an “attack”: De-anonymization • Automated analysis of encrypted traffic – Identifying the URLs/topics of SSL-encrypted web pages • Learning for web-based malware detection • Not code features, rather: Where scripts and objects comes from, Who makes the requests, How user gets to the site

Security Analytics Talks (Session 2) • Using Network Science to detect Sybils in social networks • Leveraging social structure to detect fake accounts and improve user authentication • Learning as an “attack”: De-anonymization • Automated analysis and identification of underground forums users • Understanding how End Users reason about Risk • Security, privacy, and a 9-dimensional model for users

Security Analytics Goals • Developing tools combining machine learning and analysis to automatically extract features and build models • Improving users’ experiences by translating the reasoning behind security decisions into human understandable concepts • Designing robust algorithms for large-scale machine-learning in the presence of adversarial manipulation

Security Analytics Thrust

Security Analytics Thrust

Presentation Transcript

Information Security Analytics

Thrust bearings

Thrust Vectoring

Information Security Analytics

Information Security Analytics

THRUST COUPE

Cyber Security Analytics

Thrust Areas

Thrust

Information Security Analytics

Research Thrust

Network security analytics today

Security Analytics

Thrust

Security Intelligence and Analytics

Thrust Allocation

Thrust

Global Security Analytics Market

Cyber Security Analytics

Thrust

Information and Security Analytics

Information Security Analytics