220 likes | 362 Views
Trends in Internet Measurement. Fall, 2004. Paul Barford Assistant Professor Computer Science University of Wisconsin. Motivation. The Internet is gigantic, complex, and constantly evolving Began as something quite simple Infrequent use of “scientific method” in network research
E N D
Trends in Internet Measurement Fall, 2004 Paul Barford Assistant Professor Computer Science University of Wisconsin
Motivation • The Internet is gigantic, complex, and constantly evolving • Began as something quite simple • Infrequent use of “scientific method” in network research • Historical artifact • Lack of inherent measurement capability • Decentralization and privacy concerns • Recognition of importance of empirically-based research • Critical trend over past five years (Internet Measurement Conf.) • Good research hypothesis + good data + good analysis = good research results • Focus of this talk: “good data” - where we’ve been and where we’re going wail.cs.wisc.edu
In the beginning… • Measurement was part of the original Arpanet in ’70 • Kleinrock’s Network Measurement Center at UCLA • Resources in the network were reserved for measurement • Formation of Network Measurement Group in ’72 • Rfc 323 – who is involved and what is important • First network measurement publication in ’74 • “On Measured Behavior of the ARPA Network,” Kleinrock and Taylor • No significant difference between operations a research • Size kept things tractable wail.cs.wisc.edu
From ARPAnet to Internet • In the 80’s, measurement-based publications increased • “The Experimental Literature of the Internet: An Annotated Bibliography”,J. Mogul, ’88. • Rfc 1262 – Guidelines for Internet Measurement Activities, 1991 • V. Cerf, “Measurement of the Internet is critical for future development, evolution and deployment planning.” • What happened? • “On the Self-Similar Nature of Ethernet Traffic”, Leland et al., ‘94. • Novel measurement combined with thorough analysis • A transition point between operational and research measurement (?) wail.cs.wisc.edu
Gold in the streets in the 90’s • Lots of juicy problems garnered much attention in 90’s • Transport, ATM, QoS, Multicast, Lookup scalability, etc. • The rise of simulation (aaagggghhhhh!!!!) • Measurement activity didn’t die… • Research focus on Internet behavior and structure • Self-similarity established as an invariant in series of studies • Paxson’s NPD study from ’93 to ’97 • Routing (BGP) studies by Labovitz et al. • Structural properties (the Internet as a graph) by Govindan et al. • Organizations focused on measurement • National Laboratory for Applied Network Research (‘95) • Cooperative Association for Internet Data Analysis (‘97) wail.cs.wisc.edu
Measurement must be hard… • Well, not really…lot’s of folks are measuring the Internet • See CAIDA or SLAC pages • Business get paid to measure the Internet • Lot’s of tools are available for Internet measurement • See CAIDA and SLAC pages • Dedicated hardware • Public infrastructures wail.cs.wisc.edu
So, what’s the problem? • “Strategies for Sound Internet Measurement,” Paxson ‘04. • Lack consistent methods for measurement-based experiments • Problems faced in other sciences years ago • Issues of scale in every direction • What is representative? • HUGE, HIGH-DIMENSION date sets make things break • Disconnect between measurements for operations and measurements for research • Operational interests: SLA’s, billing, privacy, … • Research interests: network-wide properties wail.cs.wisc.edu
Current measurement trends • Open end host network measurement infrastructures • Available for a variety of uses • Large public data repositories • Domain specific • Suitable for longitudinal studies • Network telescope monitors • Malicious traffic • Laboratory-based testbeds • Bench environments • Standard anonymization methods • Address privacy concerns wail.cs.wisc.edu
End host infrastructures • Paxson’s NPD study; an end-host prototype • Accounts on 35 systems distributed throughout the Internet • Active, end-to-end measurement focus • National Internet Measurement Infrastructure (NIMI) and others evolved from NPD • Perhaps a bit too ambitious at the time • Today’s end host infrastructure “success story”: Planetlab wail.cs.wisc.edu
PlanetLab overview • Collaboration between Intel, Princeton, Berkeley, Washington, others starting in early ‘02 • Began as a distributed, virtualized system project • Peer-to-peer overlay systems were getting hot • Applications BOF at SIGCOMM ‘02 had only 6/80 people • Systems were donated to an initial set of sites in ‘02 • Most major universities and Abilene POPs • Available to members who host systems • Developers have done a fine job creating a management environment • Isolates individual experiments from each other wail.cs.wisc.edu
PlanetLab sites 449 nodes at 209 sites: source www.planet-lab.org wail.cs.wisc.edu
End host infrastructures & SP • End host infrastructures are primarily for active measurement • Generate probes and measure responses • Problem domains • Network structure via tomography • Network distance estimation • End-to-end resource estimation • End-to-end packet dynamics wail.cs.wisc.edu
Large public data repositories • First data repository - Internet Traffic Archive (LBL) • Hodgepodge of traces from various projects • Current projects are more focused • Passive Measurement and Analysis Project • Packet traces from high performance monitors • Abilene Observatory • Flow traces from the Internet2 backbone routers • Route views/RIPE • BGP routing updates from ~150 networks • Datasets for network security • DHS project focused on making attack and intrusion data available for research wail.cs.wisc.edu
Data repositories & SP • Most of the data in aforementioned repositories was gathered via passive means • Counters/logs on devices • Installed instrumentation • Configuration to measure specific traffic (BGP) • Problem domains • Anomaly detection • Traffic dynamics • Routing dynamics wail.cs.wisc.edu
Network telescopes • Simple observation 1: number globally routed IP addresses <> number of live hosts • Network address translation • Networks (ranges of IP addresses) are routed • Simple observation 2: traffic to/from standard services should only arrive at live hosts • Misconfigurations and malicious traffic are the exceptions • Network telescope = traffic monitor on routed but otherwise unused IP addresses • This traffic is otherwise usually dropped at border router wail.cs.wisc.edu
So, what’s the point? • Bad guys don’t know which IP addresses in a network a live • Random and systematic scanning commonly used • Spoofed source addresses are used in DoS attacks • Misconfigurations are fairly rare • Ergo, network telescopes can provide important perspective on malicious traffic • Most importantly, a clean signal • Implementation is fairly simple • Honeypots of live systems or honeypot specific monitors wail.cs.wisc.edu
What do we see? • “Characteristics of Internet Background Radiation,” Yegneswaran et al., ‘04. • Active monitors (small, medium, large) at 3 sites • Traffic is dominated by activity on common services • Worms and probes targeting HTTP and NetBIOS • The focus of our study • Traffic is highly variable and diverse • Perspectives from 3 monitors are quite different • Traffic mutates rapidly • Much deeper analysis is necessary wail.cs.wisc.edu
Network telescopes & SP • An emerging, rich source of data • Network security is critically important • Problem domains • Outbreak and attack detection • Collaborative monitoring • Dynamic quarantine • (Misconfiguration analysis) wail.cs.wisc.edu
Laboratory-based testbeds • Most scientific disciplines commonly use bench environments to conduct research • Control • Instrumentation • Repeatability • Network research community has relied on analytic modeling, simulation and empirical measurement • Openly available bench environments for network research are emerging • EMULAB at Utah - collection of end hosts • Wisconsin Advanced Internet Lab - collection of routers and end hosts wail.cs.wisc.edu
Laboratory testbeds & SP • The effectiveness of bench research hinges on research hypothesis and experimental design • Aspects of scale (emergent behavior) are difficult to capture • Problem domains • Inference tool analysis • Protocol (implementation) analysis • Anomaly detection • Network system evaluation wail.cs.wisc.edu
Data anonymization • Lots of people measure, most are scared s*!#less about sharing data • This is a legal issue • No standards (sure payloads are off limits, but addresses?) • Don’t ask, don’t tell? • The community is developing tools for trace anonymization • “A High-Level Programming Environment for Packet Trace Anonymization and Transformation,” Pang et al., ‘03. • Prefix preserving address anonymization • Payload hashing • Probably no direct SP application • But, implications vis-à-vis future data availability wail.cs.wisc.edu
Summary • Trends over past 30 years • Divergence of research and operations • Decline of importance of measurement in research • Empirical study of the Internet as an artifact • Current trends • Rise of measurement as a discipline • Open infrastructures and network testbeds • Large-scale domain specific data repositories • Novel measurement methods • Future trends • ?? • Embedded measurement systems wail.cs.wisc.edu