Vern Paxson, Stefan Savage George Varghese, Geoff Voelker, Nick Weaver

Collaborative Center for Internet Epidemiology and Defenses (CCIED)Technical Advisory Board Meeting Vern Paxson, Stefan Savage George Varghese, Geoff Voelker, Nick Weaver Mark Allman, Juan Caballero, Martin Casado, Jay Chen, Simon Crosby, Weidong Cui, Cristian Estan, Ranjit Jhala, Jaeyeon Jung, Chris Kanich, Jayanth Kumar Kannan, Erin Kenneally, Kirill Levchenko, Justin Ma, Marvin McNett, David Moore, Michelle Panik, Colleen Shannon, Sumeet Singh, Alex Snoeren, Amin Vahdat, Erik Vandekieft, Michael Vrable, Ming Woo-Kawaguchi, Vinod Yegneswaran

Welcome! • First some context… • This isn’t a “sales pitch” • We created a TAB for our benefit • We want to improve the effectiveness of the project and we think you can help • …and some ground rules • We’re going to give some informal presentations • Ask questions and give informal feedback anytime • The meeting today is private, but nothing is confidential • We have some specific high-level focus questions that we’d like you to think about and give feedback

Focus questions for the TAB • Are we considering the right threats? • Are there other technical approaches we should be considering? • Are we missing any important partnership opportunities? • Are we missing any key capabilities on our team? • What education/training is necessary/missing for practitioners in the field? How can we best help here?

Agenda • 9:30-10:30 Intro • 10:45-12:00 Data Collection (Honeyfarms) • 12:00-1:30 Lunch • 1:30-1:45 Potpourri • 1:45-2:30 Detection/Defense • 2:30-3:00 Future • 3:30-4:30 TAB Breakout • 4:30-5:30 TAB Feedback • Dinner

For the rest of our time… • Motivation and scope • What we promised NSF • Research & education • Prior activity and background • Monitoring • Analyses • Defense

Motivation: threat transformation • Traditional threats • Attacker manually targets high-value system/resource • Defender increases cost to compromise high-value systems • Biggest threat: insider attacker • Modern threats • Attacker uses automation to target all systems at once (can filter later) • Defender must defend all systems at once • Biggest threats: software vulnerabilities & naïve users

Driving economic forces • No longer just for fun, but for profit • SPAM forwarding (MyDoom.A backdoor, SoBig), Credit Card theft (Korgo), DDoS extortion, etc… • Symbiotic relationship: worms, bots, SPAM, DDoS, etc • Fluid third-party exchange market (millions of hosts for sale) • Going rate for SPAM proxying 3 -10 cents/host/week • Seems small, but 25k botnet gets you $40k-130k/yr • Raw bots, 1$+/host, Special orders • Generalized search capabilities are next • “Virtuous” economic cycle • Bottom line: compromised hosts are aplatform

Overall CCIED Scope Developing understanding and technology to address the threats of large-scale host compromise

CCIED’s research responsibilities • Internet Epidemiology: Understanding • What kinds of new attacks are going on? • What are their limits? • Automated Network Defenses: Reacting • Stop new attacks without humans in the loop • Legal and Economic issues: Worrying • What are liability issues? • How to create forensic and commercial value?

CCIED’s education responsibilities • We are committed to provide yearly workshop to help train researchers and the workforce (interpreted broadly) in these issues • Input appreciated for this, format and who best short term audience might be • Curriculum development • Worm/virus segments for undergrad and grad classes

Year one milestones • Development and deployment of large-scale network worm detection system (telescope/simple honeyfarm) • Testing of prototype in-line defenses (scan suppression, signature extraction) • Legal issues related to both technologies • Initial Worm/Virus curriculum for security courses • CIED Web Portal running

Ancient history – independent groups • In late 90’s Paxson deploys Bro IDS system at LBL and starts looking at network-based intrusions • In 2000, UCSD develops “network telescope”-based backscatter DoS inference technique See: Paxson, Bro: a System for Detecting Intruders in Real Time, USENIX Security, 1998 & Moore et al, Inferring Internet Denial of Service Activity, USENIX Security, 2001

Code Red • Code Red epidemic takes off in 2001, first large-scale network worm in over a decade • Selects IP address at random and probes for vulnerability • Monitored via telescopes • ~360,000 hosts in a day • Slow admin response • Didn’t do much • Growth matches logisticfunction See: Moore et al, CodeRed: a Case study on the Spread of an Internet Worm, IMW 2002 andStaniford et al, How to 0wn the Internet in your Spare Time, USENIX Security 2002

Code Red is only proof of concept • Better targeting possible • Biased: local biases faster and more likely to hit • Topological: exploit application-level networks (e.g. e-mail, p2p apps, google vs searchers, etc) • Hitlist: predetermine vulnerable hosts (at least some) • Metaserver worms – exploit directory servers for this purpose • Permutation scanning: don’t duplicate effort • Contagion worms: hide in existing communication patterns • More destructive payload possible • Toast disk, toast bios, patch microcode • Simple cost models suggest multi-billion costs achievable • Call for Cyber-CDC See: Staniford et al, How to 0wn the Internet in your Spare Time, USENIX Security 2002 and Weaver et al, A Worst-case Worm. WEIS 2004

How well must defense work? • Containment strategy • “Sharable” signatures offer huge advantages • Reaction Time • For CodeRed densities • 3hrs for 10 probes/sec • 2mins for 1000 probes/sec • Deployment • Need to interdict most paths • Worms form worlds-best overlay net See: Moore et al, Internet Quarantine: Requirements for Containing Self-Propagating Code, Infocom 2003

Aside • Around this time both groups are providing input to Anup Ghosh (DARPA) for new program: Dynamic Quarantine • We join forces and put in joint proposal • Highest-rated proposal for DQ • Project then classified (then reclassified again!) • Group stays in touch…

A pretty fast outbreak:Slammer (2003) • First ~1min behaves like classic random scanning worm • Doubling time of ~8.5 seconds • CodeRed doubled every 40mins • >1min worm starts to saturateaccess bandwidth • Some hosts issue >20,000 scans per second • Self-interfering(no congestion control) • Peaks at ~3min • >55million IP scans/sec • 90% of Internet scanned in <10mins • Infected ~100k hosts (conservative) See: Moore et al, The Spread of the Sapphire/Slammer Worm, IEEE Security & Privacy, 1(4), 2003

Was Slammer really fast? • Yes, it was orders of magnitude faster than CR • No, it was poorly written and unsophisticated • Who cares? It is literally an academic point • The current debate is whether one can get < 500ms • Bottom line: way faster than people! See: Staniford et al, The Top Speed of Flash Worms, ACM WORM, 2004

Aside: How to think about worms • Reasonably well described as infectious epidemics • Simplest model: Homogeneous random contacts • Classic SI model • N: population size • S(t): susceptible hosts at time t • I(t): infected hosts at time t • ß: contact rate • i(t): I(t)/N, s(t): S(t)/N courtesy Paxson, Staniford, Weaver

What’s important? • There are lots of improvements to the model… • Chen et al, Modeling the Spread of Active Worms, Infocom 2003 (discrete time) • Wang et al, Modeling Timing Parameters for Virus Propagation on the Internet , ACM WORM ’04 (delay) • Ganesh et al, The Effect of Network Topology on the Spread of Epidemics, Infocom 2005 (topology) • … but the bottom line is the same. We care about two things: • How likely is it that a given infection attempt is successful? • Target selection (random, biased, hitlist, topological,…) • Vulnerability distribution (e.g. density – S(0)/N) • How frequently are infections attempted? • ß: Contact rate

What can be done? • Reduce the number of susceptible hosts • Prevention, reduce S(t) while I(t) is still small(ideally reduce S(0)) • Reduce the contact rate • Containment, reduce ß while I(t) is still small This is where most of our work has focused

Scan Detection • Basic idea: detection scanning behavior indicative of worms and shoot down hosts • Threshold Random Walk algorithm • Scanners will not usually succeed • Track ratio of failed connection attempts to connection attempts per IP address; should be small • Can be approximated for line-rate implementation in hardware (being built by Nick) See: Jung et al, Fast Portscan Detection Using Sequential Hypothesis Testing, Oakland 2004, Weaver et al, Very Fast Containment of Scanning Worms, USENIX Security 2004

Content sifting • Key idea: quickly infer content signature for new worm • Assume there exists some (relatively) unique invariant bitstring W across all instances of a particular worm • Two consequences • Content Prevalence: W will be more common in traffic than other bitstrings of the same length • Address Dispersion: the set of packets containing W will address a disproportionate number of distinct sources and destinations • Content sifting: find W’s with high content prevalence and high address dispersion and drop that traffic • By using approximate data structures can be implemented at line-rate See: Singh et al, Automated Worm Fingerprinting, OSDI 2004.

CCIED formed in 2004 • Joint UCSD/ICSI collaboration • $6.2M from NSF over 5 years • Synergistic support from Microsoft, HP, Intel, VMware, CNS • Between 20-25 people involved • Our first year of operation completes in November

Questions ?

Vern Paxson, Stefan Savage George Varghese, Geoff Voelker, Nick Weaver