1 / 32

Combatting Advanced Cybersecurity Threats w ith AI and Machine Learning

Combatting Advanced Cybersecurity Threats w ith AI and Machine Learning. Andrew B. Gardner, Ph.D. SPO1-T11. Senior Technical Director Symantec Corporation @ andywocky. The AI and ML Revolution is Here. Self-driving cars. The AI and ML Revolution is Here. AI-generated art.

lisagriffin
Download Presentation

Combatting Advanced Cybersecurity Threats w ith AI and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combatting Advanced Cybersecurity Threats with AI and Machine Learning Andrew B. Gardner, Ph.D. SPO1-T11 Senior Technical Director Symantec Corporation @andywocky

  2. The AI and ML Revolution is Here Self-driving cars

  3. The AI and ML Revolution is Here AI-generated art L. Gatys, A.S. Ecker and M. Bethge, “A Neural Algorithm of Artistic Style,” https://arxiv.org/pdf/1508.06576v1.pdf

  4. The AI and ML Revolution is Here Computational perception – face recognition (and speech, text, social, video, etc.) Y. Taigman et al., “DeepFace: Closing the Gap to Human-Level Performance in Face Verification,” https://research.fb.com/wp-content/uploads/2016/11/deepface-closing-the-gap-to-human-level-performance-in-face-verification.pdf

  5. Key Points to Cover in This Talk What are AI and ML? Why are AI and ML important for cybersecurity? AI / ML at Symantec AI / ML and the future of cybersecurity

  6. AI & ML Overview

  7. What are ML and AI? MACHINE LEARNING ARTIFICIAL INTELLIGENCE The capability of a machine to learn without explicitly being programmed. The capability of a machine to imitate intelligent human behavior. learning perception decisions autonomy

  8. An Example: AI vs. ML AI: self-driving car ML: pedestrian detection

  9. In Cybersecurity We Focus More on Learning Reasons for ML focus: What should a log file “look like?” • Complex sequential data • Not human-intuitive • What should a program trace or log file look like? • Scarce | expensive labels • Closed research models •  Slower to advance AI/ML

  10. Is This New? deep learning before after AI services Netflix Prize deep learning(mobile) AI & MLcore developmentsmulti-industry ensembling decision trees CAML ImageNet 1950s – 2000s 2006 2008 2012 2009 2014 2015 2016

  11. How is AI/ML Used in Security Today? Yet Another Threat Detector (YATD) Collect Datasets Training Algorithm Trained Model Researcher/Scientist Updated Classifiers • Straightforward recipe • Data with labels • Build / update classifiers • Debate about techniques • Rely on data scientists • Feature engineering • Updates & tweaks Yet Another Threat Detector (YATD)

  12. How is AI/ML Used in Security Today? Hidden (Automated) Systems • Primarily for automation • Not user-facing • Services and applications • Data + software engineering + ML • Examples: • Continual detector retraining • Smart data collection and labeling • Anomaly detection for IDS

  13. Why are AI/ML Important for Cybersecurity?

  14. AI/ML Adoption Drivers Benefits • Complex threats • Advanced persistent threats • New malware vectors • non-PEs • ransomware • Social engineering • … plus many others • Humans are slow • Humans are expensive • Automation • Scaling and velocity • Faster response and protection • Personalization • Learn to adapt to me, unobtrusively • Usability • Cross-domain protection • Firewalls talking to email servers and endpoints

  15. Are There Downsides to Using AI/ML? Poor architecture & unintended side effects • Detectors A & B independent • New system introduced • creates feedback between A/B • inadvertent, unknown? • New sample arrives: • A  2/10, B  1/10 • … but B sees ΔA, B  3/10 • … but A sees ΔB, A  4/10 • … and so on data A B

  16. Are There Downsides to Using AI/ML? source code ML Technical Debt stateful, complex system • Traditional software • Source code  program • Machine learning software • Source code + data  program • Data are embedded, opaque • Reconstruction is hard or impossible • ML data versioning is hard • Introduces data and system dependencies data ML program

  17. Adversaries Have AI/ML, Too!! Adversarial Machine Learning model • Model extraction • Adversary learns an approximate model using fewest possible queries • Poisoning • Adversary biases machine learning model through interaction • Adversarial examples • Crafting inputs to defeat ML. data panda perturb gibbon I. J. Goodfellow, J. Shlens, C. Szegedy, “Explaining and Harnessing Adversarial Examples.” ICLR 2015.

  18. Advanced Behavioral Attacks Microsoft Real-Time Translation (2012) https://www.youtube.com/watch?v=Nu-nlQqFCKg • Imagine a business email compromise attack • you get an email to wire payment for an invoice from the CFO • The email is written like your CFO • natural language processing from emails • You’re suspicious and call the CFO • But your phone is compromised • You’re connected to an adversary who has a speechbot with your CFOs voice • Science fiction or possible today?

  19. AI/ML at Symantec

  20. The Symantec AI/ML Story • Define the goal: • Perfect, ubiquitous, unobtrusive protection for every context of our customer’s digital lives. • Invest in AI/ML resources • Center for Advanced Machine Learning (CAML), ~20 PhDs • Capitalize on unique telemetry assets (data!) • Automate and scale  ML everywhere • Improve protection using AI/ML

  21. Doing AI & ML (Correctly) is Hard! BOUNTIFUL DATA DIMENSIONS • 9 Trillion rows of security data • 4.5B queries processed daily from 175M endpoint devices • 2B emails scanned daily • 1B previously unseen web requests scanned daily • Outputs from other systems & products • Static attributes • Dynamic behaviors • Reputation • Relations • Sequential state ADVANCED TECHNIQUES LEADING EXPERTS • Ensembling • Boosting • Sequential Learning • Deep Learning • Automation at Scale • Dedicated org of recognized machine learning experts • World-renowned attack investigation team • Centuries of combined ML experience

  22. Optimize for Outcomes useful 100% 0 <= AUC=0.83 <= 1.0 True Positives (TP) 0 <= AUC=0.75 <= 1.0 0 2% False Positives (FP) 100% • This is an ROC • Shows TP/FP tradeoff • Create one for any classifier • How good is this detector? • Textbook: AUC, area under the curve = 0.83 • 0.83 is good, but… • Only the region with low FPs is useful to customers • The blue ROC is better in the real world!

  23. Tackle Fundamental Problems. As Services. Charlatan – String Scoring Service • One HTTP RESTfulML service • containerized for deployment • available for products • available for experimentation • Deep learning on sequences • String scoring is really useful! • Malicious package names & filenames • DGA identification • Phishing domains • Adult content URLs • … and more

  24. Services Everywhere! Shiftsequential changepoint detection Foresterautomagic, optimal decision tree ensembles Sifterrobust, automated feature selection Dolphincomputer vision phishing website detection Murdochbehavioral & sequential anomaly detection Multiraterunsupervised ground truth labeling Lyrebirdactive learning for detecting targeted spearphishing

  25. Keep the Customer in Mind Endpoint Static Protection • Code name “Sapient” • Goal: upgrade efficacy • better detections • fewer false positives • How? • Not techniques– we still use lots of tree ensembles •  Better data, experiment design, optimization 99%+ 100% WORLD-CLASS better detection 93% TYPICAL TRUE positives fewer false positives 0.1% 1% 1% 0 FALSE positives

  26. Practice Good Practices Pro vs. Joe: Best Practices Practiced Well • Scholarship – literature surveys, measurement, build on other work • Exploratory data analysis • Strong imputing • Sampling for class imbalance and experiment design • Hyperparameter optimization • Encoding & embedding for feature engineering • Feature selection • Model selection • Ensembling techniques • Thresholdout for reusable holdouts All of this is table stakes… …before we do “hard” ML … … or fancy “new” stuff This can be difficult for engineers and data scientists

  27. Practice Good Practices Pro vs. Joe: Best Practices Practiced Well Scholarship – literature, measurements,past work. • Scholarship – literature surveys, measurement, build on other work • Exploratory data analysis • Strong imputing • Sampling for class imbalance and experiment design • Hyperparameter optimization • Encoding & embedding for feature engineering • Feature selection • Model selection • Ensembling techniques • Thresholdout for reusable holdouts Sampling and experiment design Ensembling techniques – combining models All of this is table stakes… …before we do “hard” ML … … or fancy “new” stuff This can be difficult for engineers and data scientists

  28. Closing Time

  29. The Future of AI & ML in Cybersecurity now • Superpowers for analysts • hunting for targeted spearphishing attacks 100x faster • Threat detection systems that self-evolve • Programs that understand other program binaries • Real-time conversation monitoring for • social engineering, cyberbullying, fake news, help, etc. • Predictive protection • AI for fuzzing, bugs, exploits & zero days future

  30. Key Takeaways • AI/ML is real, it’s here and it’s disruptive! • Lots of opportunities and challenges • The “terminator wars” of the future will play out at scale, speed and cost that humans cannot match. Deep resources – cash, expertise, data, systems– will be required table stakes. • AI/ML benefits bad actors, too • The right expertise and experience are essential • AI/ML researchers + data scientists + security professionals • Systems, data and integration are differentiators • Symantec has the right pieces to play in this area 

  31. Where Do You Go from Here? • Level up your organization’s AI/ML skills • Recruit, hire, train, borrow asap– you’re already behind! • Treat your ML systems and features as attack surfaces • Raise the bar on best practices when using ML • Work hard to reduce technical debt and unintended side effects! • Buy/use integrated solutions vs. point products • Preferably those which provably and usably leverage AI/ML

  32. Thank you! Andrew B. Gardner, Ph.D. Senior Technical Director Symantec Corporation 470-330-2435 andrew_gardner@symantec.com @andywocky Andrew B. Gardner, Ph.D. For follow-ups:

More Related