Cyber-Security: Some Thoughts

Cyber-Security: Some Thoughts V.S. Subrahmanian Center for Digital International Government Computer Science Dept. & UMIACSUniversity of Maryland vs@cs.umd.edu www.cs.umd.edu/~vs/ Parts of this talk reflect joint work with M. Albanese, S. Jajodia, C. Molinaro, A. Pugliese, N. Rullo, C. Thomas V.S. Subrahmanian, Geo-Intelligence India 2013

Disclaimers • All work described in this talk only uses open-source data. • All work in this talk is basic research tested wherever possible against real-world data. • All work reported in this talk has been published in the scientific literature. V.S. Subrahmanian, Geo-Intelligence India 2013

Talk Outline • Terminology • Vulnerabilities • Exploits • Technology • Monitoring networks for known attacks • Monitoring networks for unknown attacks • Social media (Sybil, sockpuppet) attacks V.S. Subrahmanian, Geo-Intelligence India 2013

Terminology • Vulnerability: Feature of software that can be used by an attacker – usually in a way unanticipated by the software designer – to attack a system. US National Vulnerability Database (nvd.nist.gov) contains over 56K vulnerabilities together with suggested patches. • Exploit – a piece of code that takes advantage of a vulnerability to carry out an attack. Databases of exploits also exist, e.g. some sites claim over 22K exploits in their database V.S. Subrahmanian, Geo-Intelligence India 2013

The Cyber Trade: The Scary Part • “Exploits as a service” is now cheap and efficient for attackers [criminals, nation states] • Exploits (or parts thereof) for different kinds of attacks can be bought for a very small price compared to the prices for artifacts used in kinetic attacks V.S. Subrahmanian, Geo-Intelligence India 2013

OFFLINE ONLINE tMAGIC Activity Detection Engine Known Activities -Bad PASS Parallel Activity Search System • Database • Real-time • Observation • Data • Network • Resource use • and more Unexplained Activity Detection Engine ALE Activity Learning Engine Parallel Unexplained Activity Detection Known Activities - Good Security Analyst Interface V.S. Subrahmanian, Geo-Intelligence India 2013

Attack Graphs • Attack Graphs • C’s are conditions • V’s are vulnerabilities • C4 and C5 are both needed to exploit vulnerability V4. • Vulnerability V4 causes condition C6. • Temporal Attack Graphs • Only worry about vulnerabilities. • Figure on left says vulnerability V4 can be exploited if V3 and either V1 or V2 can be exploited. • Probabilistic versions exist. Databases of vulnerabilities and attack graphs are available V.S. Subrahmanian, Geo-Intelligence India 2013

Attack Graphs Can be Merged Merging a large set of attack graphs means that you can solve a task once to search for multiple occurrences within a single stream of transactional data ! V.S. Subrahmanian, Geo-Intelligence India 2013

Attack Graphs • Attack graphs can be built semi-automatically to monitor live network traffic. But two key problems need to be solved: • How to monitor huge volumes of traffic ? • How to identify unexpected activities that you did not know about in the past and add them to your activity knowledge base ? • Activities are both bad (attacks) and good (innocuous). • Need models of both good and bad activities in order to identify what is abnormal or unexplained. V.S. Subrahmanian, Geo-Intelligence India 2013

Finding Known ActivitiesPASS Parallel Activity Search System • Developed algorithm to identify all instances of a [known] activity in an observation stream that have at least a certain probability. • Demonstrated the ability to automatically detect activities in a stream of observation data arriving at 500K+ observations per second on a 8-node cloud. • Demonstrated the ability to identify unexplained behavior in observation streams with precision over 80% and recall over 70%. V.S. Subrahmanian, Geo-Intelligence India 2013

Unexplained Activities • How can we look for activities that have never been anticipated? • Answer • Set up a framework to continuously track unexplained activities; • Present unexplained activities quickly to a security analyst who • Flags it as a bad activity or • Flags it as an OK activity • Update repertoire of known activity models with this security analyst feedback. • What is an unexplained activity? • It’s a sequence (not necessarily contiguous) of events that are inconsistent with all known activity models (good or bad) • Unexplained does not necessarily mean bad. • Also a lot of work on statistical anomaly detection [not in my lab]. V.S. Subrahmanian, Geo-Intelligence India 2013

Example Unexplained Activity V.S. Subrahmanian, Geo-Intelligence India 2013

Unexplained Activity Detection Totally unexplained Partially unexplained Tested using network traffic from a university. Wireshark used to capture network traffic; SNORT used for activity models. V.S. Subrahmanian, Geo-Intelligence India 2013

Unexplained Activity Detection Looking for more top-K increases runtime Increasing t reduces run-time Increasing sequence length reduces runtime Looking at more worlds increases runtime Tested using network traffic from a university. Wireshark used to capture network traffic; SNORT used for activity models. V.S. Subrahmanian, Geo-Intelligence India 2013

An Election Social Media Attack V.S. Subrahmanian, Geo-Intelligence India 2013

Election Social Media Attack V.S. Subrahmanian, Geo-Intelligence India 2013

Social Media Attacks • A major state-backed threat. • SMAs cause a viral increase in the number of social media posts in support of a particular cause or position. • SMAs can destabilize decision making by a country by providing a false picture of support for or against a given position. V.S. Subrahmanian, Geo-Intelligence India 2013

Other Relevant Work • Algorithms to identify common patterns in huge networks (1B+ edges) • Ability to update identified patterns in huge networks as the network changes (540M+ edges) • Algorithms to find a set of K nodes that optimizes an arbitrary objective function on a network (31M+ edges) • Algorithms to identify important nodes in attributed, weighted networks • Learning to cluster malware variants V.S. Subrahmanian, Geo-Intelligence India 2013

Current Directions • Learning Activity Models – given that there is some set of low level events that can be detected, can we learn the stochastic temporal automata directly from the data in a semi-supervised manner? • Parallel Unexplained Activity Detection – can we scale up our current algorithms to identify unexplained activities in high throughput streams? V.S. Subrahmanian, Geo-Intelligence India 2013

Contact Information V.S. Subrahmanian Dept. of Computer Science & UMIACS University of Maryland College Park, MD 20742. Tel: 301-405-6724 Email: vs@cs.umd.edu Web: www.cs.umd.edu/~vs/ V.S. Subrahmanian, Geo-Intelligence India 2013

Cyber-Security: Some Thoughts