UNM CS Dept. Profile

Technological Networks: From Empirical Laws to TheoryStephanie Forrest Dept. of Computer ScienceUniversity of New Mexicohttp://cs.unm.edu/~forrestforrest@cs.unm.edu

UNM CS Dept. Profile • 18 Faculty: • 9 tenured (5 full, 4 assoc), 5 assistant,1 lecturer • 3 openings (2 junior, 1 senior) • Prince of Asturias Endowed Chair in Information Technology • External faculty appointments from other departments and national labs. • Close collaborations with SFI and national labs (SNL and LANL) • Students: • Degrees: BS: ~40, MS: ~35, PhD: ~5 • Undergraduate: ~200 majors; BS degrees • >20% Female; >35% Minorities • Graduate: ~120 MS ~80 Ph. D. • >20% Female; 40-60% Foreign • Funding: (2004-2005) • Total: $3.5M • NSF, DARPA, DOE, NIH, Sandia and Los Alamos

UNM CS Dept. Research • Strongly Interdisciplinary: • Adaptive computation • New paradigms of computing (molecular and quantum computation) • Computational biology and bioinformatics (phylogenetic tree reconstruction, radiology, RNAi) • Graphics and visualization • High-performance computing: • Light-weight distributed operating systems • Automated reasoning and machine learning: • Otter • POMDP • Complex networks: • Provably robust scalable algorithms for P2P networks • Phase transitions in NP-complete problems

Themes of Talk • The real world isn't exactly scale free. • Understanding and predicting network structure is important for engineering: • Network properties can be exploited to enhance computer security (computer epidemiology / border gateway protocol) • We lack theory to explain/predict the structure of technological networks: • Preferential attachment isn’t good enough. • Initial steps toward theory.

Distribute resources: Energy Materials Information Energy distribution: Power grids Gas pipelines Transportation: Highways Airline routes The Internet: Physical connectivity Autonomous systems (AS) World-wide web Social contacts, e.g., email Microarchitecture Technological Networks

Network Structure • Network topology affects network properties: • Shortest distance between two nodes • Bisection width • Rate and extent of contagion • Analysis: • Epidemiological models. • The epidemic threshold. • Degree distribution of network: • Scale-free (power law) networks: • Pk = k-c • Controlling infections on scale-free networks: • Random vaccination is ineffective (e.g., anti-virus software). • Targeted vaccination of high-connectivity nodes.

Example 1: Computer EpidemiologyJustin Balthrop, Mark Newman, Matt Williamson • Viruses and worms spread over networks of contacts between computers: • Email address books. • URL links. • Different types of networks are exploited by different types of infections. UNM CS Dept. Network of Address Books

Degree Distributions of Four NetworksRelevant to Computer Security • Not scale-free. • Targeted vaccination unlikely to be effective: • > 10% of nodes required for address book data • > 87% of nodes required for email traffic data • Computer infections can choose their own topology, so network topology is not static. • Viruses spread faster than repairs. Science 304:527-529 (2004)

Throttling: Generic Control of EpidemicsMatt Williamson and Justin Balthrop • Control network topology in time rather than space. • Limit the rate at which a computer can make new connections • Limits spreading rates rather than stopping. • Assumes that virus traffic is significantly different from normal traffic: • Nimbda infects up to 400 new machines per second compared to normal rate of connections to new Web servers of about one connection per second or slower (Williamson, 2002). • Throttling Nimbda would have increased the epidemic time from one day to over one year. • Advantages: • Effective when the form of infection not known in advance. • Reduces amount of traffic generated during an epidemic. • Limitations: • Effectiveness decreased if not all nodes are throttled (altruism), stealth attacks. • Implementations: • RIOT • HP’s Virus Throttle

Vision: An adaptive desktop firewall: Hamper worms, viruses, DOS, misconfigurations, etc. Graduated automated response (throttling) on all connections. Adaptive (learn normal). Robust to false positives Personal desktop firewalls: Coarse-grained Static and preprogrammed Generic rate limit (throttle): Delay network connections that occur at an anomalous rate. Detector activation triggers delay and packets are dropped. How to detect anomalous connections? Set of detectors (lymphocytes) that observe TCP connections: Data Path (below) Meta-information (direction, TCP flags) Each detector matches some portion of IP space. Each detector has its own normal activity level. 32 bits 32 bits 16 bits 16 bits Responsive I/O Throttle (RIOT)Justin Balthrop and Matt Williamson

Learning and Throttling Connections • A new connection is initiated (SYN Packet) • The connection information is translated to a bitstring • The bitstring is shown to all detectors • An immature detector exceeds its activation threshold • The detector’s activation threshold is raised • Detectors with stable activation thresholds become mature

Autonomic Responses: A Repertoire

~20,000 Autonomous Systems (ASs) connected via the Border Gateway Protocol (BGP) ASs route blocks of IP’s, know as prefixes 64.106.0.0/17 (Owned by UNM) ~170,000 prefixes owned by ASs today. Border gateway protocol (BGP): Tell neighbors about new routes (Announcements). Tell neighbors about old routes gone bad (Withdrawals). That’s it. Example 2: Inter-Domain RoutingJosh Karlin, Stephanie Forrest, and Jennifer Rexford

BGP Networks are Interesting and Important • Distributed: • Nodes are AUTONOMOUS systems. • No centralized routing information (routes stored and maintained locally). • No authentication of new nodes or routes. • Dynamic: • Network connectivity changes routinely and continually. • Network updates are spread through local contact (BGP). • Confluence of technological and economic constraints: • A “policy” network as well as a routing network. • All inter-domain Internet traffic relies upon BGP. • Vulnerable: • Trivial to inject false routing information into network. • Man-in-the-Middle attacks. • Pretty Good BGP (PGBGP) and Internet Alert Registry (IAR). • Throttle the adoption of new routes.

PGBGP Algorithm • Main Idea: Delay Suspicious Routes • Lower the preference of suspicious routes (24hr) • Detection: • Monitor BGP update messages • Treat origin ASs for a prefix seen within the past few days as normal • Treat new origin ASs as suspicious for 24 hours, then accept as normal (possible prefix hijack) • Treat new sub-prefixes as suspicious for 24 hours, then accept as normal (possible sub-prefix hijack) • Response: • Suspicious origin AS routes are temporarily given low local preference • Suspicious sub-prefixes are temporarily ignored (not forwarded to)

PGBGP Advantages • Incremental deployability • No change to BGP protocol, just to path selection • Immediate benefits to adopting AS and customers • Automated and immediate response • Avoid using and propagating the bogus route • Network has chance to stop the attack before it spreads • Robust to false positives • Lowering preference for suspicious routes • No loss of reachability • Accidental short-term delays do no harm • Offline investigation of suspicious route • Internet Alert Registry, active probing, … • Adaptive, simple

Incremental Deployment Prefix Hijack Subprefix Hijack • Limitations: • Doesn’t address path spoofing, redistribution attacks. • Negligence ICNP (2006)

BGP Network StructurePetter Holme and Josh Karlin • Barabasi-Albert (BA) model (Barabasi and Albert, 1999): • Vertices and edges added iteratively • Probability of attaching to vertex i is proportional to k(i) • Inet model (Winick and Jamin, 2000) • No simple growth principle • Generate random graph with known degree distribution • Augment to mimic additional known correlations (e.g., connecting all high-degree nodes) Proc. Royal Acad. A (in press)

Tentative Conclusions • Real AS graph is more heterogeneous than can be expected from degree distribution alone: • Core providers in the low-d tail • Peak at d=3 (vertices directly connected to the core) • Second peak at d=4 (vertices directly connected to d=3 nodes) • More structure in periphery than predicted by earlier models. • Preferential attachment is a poor model for network growth. • What constraints determine network architecture and growth?

Allometric Scaling in Biology • Dominant design constraint: • Distribute resources to every cell in the organism • Internal, space-filling, hierarchical networks (vascular system) • Invariant terminal units (capillaries) • Optimality (minimize transport time, maximize metabolism)

Examples: Scaling in SoftwareH. Inoue LimeWire Behavior HelloWorld: Unique Function Calls vs. Invocation Freq

Example: Scaling in Software Dave Ackley and Terry Van Belle • How to measure evolvability? • Likelihood (how likely is a location to change). • Impact (change in one location affects other locations). • Work = [Likelihood x Impact]: • Software maintenance costs. • Expected time to evolve. • Study long-lived Java code bases: • Code change sizes and frequencies are power law ish. Van Belle, 2004

A Theory of Network Scaling: Conclusions • Why it’s important • Networking infrastructure continues to expand by orders of magnitude: • NSF GENI project: Redesign the Internet from the ground up. • Interplanetary networking. • Architecture: • Move from performance-oriented to power-aware designs. • The end of silicon. • Scaling problems in software, security problems everywhere • Why it’s hard • Terminal units aren’t necessarily invariant sized • Dimensionality and geometry are not obvious • May require new mathematics

UNM CS Dept. Profile