220 likes | 243 Views
Explore the historical evolution of Internet measurement from Arpanet to the modern trends, focusing on the importance of good data for research results. Learn about end-host infrastructures like PlanetLab and current measurement methodologies.
E N D
Trends in Internet Measurement Fall, 2004 Paul Barford Assistant Professor Computer Science University of Wisconsin
Motivation • The Internet is gigantic, complex, and constantly evolving • Began as something quite simple • Infrequent use of “scientific method” in network research • Historical artifact • Lack of inherent measurement capability • Decentralization and privacy concerns • Recognition of importance of empirically-based research • Critical trend over past five years (Internet Measurement Conf.) • Good research hypothesis + good data + good analysis = good research results • Focus of this talk: “good data” - where we’ve been and where we’re going wail.cs.wisc.edu
In the beginning… • Measurement was part of the original Arpanet in ’70 • Kleinrock’s Network Measurement Center at UCLA • Resources in the network were reserved for measurement • Formation of Network Measurement Group in ’72 • Rfc 323 – who is involved and what is important • First network measurement publication in ’74 • “On Measured Behavior of the ARPA Network,” Kleinrock and Taylor • No significant difference between operations a research • Size kept things tractable wail.cs.wisc.edu
From ARPAnet to Internet • In the 80’s, measurement-based publications increased • “The Experimental Literature of the Internet: An Annotated Bibliography”,J. Mogul, ’88. • Rfc 1262 – Guidelines for Internet Measurement Activities, 1991 • V. Cerf, “Measurement of the Internet is critical for future development, evolution and deployment planning.” • What happened? • “On the Self-Similar Nature of Ethernet Traffic”, Leland et al., ‘94. • Novel measurement combined with thorough analysis • A transition point between operational and research measurement (?) wail.cs.wisc.edu
Gold in the streets in the 90’s • Lots of juicy problems garnered much attention in 90’s • Transport, ATM, QoS, Multicast, Lookup scalability, etc. • The rise of simulation (aaagggghhhhh!!!!) • Measurement activity didn’t die… • Research focus on Internet behavior and structure • Self-similarity established as an invariant in series of studies • Paxson’s NPD study from ’93 to ’97 • Routing (BGP) studies by Labovitz et al. • Structural properties (the Internet as a graph) by Govindan et al. • Organizations focused on measurement • National Laboratory for Applied Network Research (‘95) • Cooperative Association for Internet Data Analysis (‘97) wail.cs.wisc.edu
Measurement must be hard… • Well, not really…lot’s of folks are measuring the Internet • See CAIDA or SLAC pages • Business get paid to measure the Internet • Lot’s of tools are available for Internet measurement • See CAIDA and SLAC pages • Dedicated hardware • Public infrastructures wail.cs.wisc.edu
So, what’s the problem? • “Strategies for Sound Internet Measurement,” Paxson ‘04. • Lack consistent methods for measurement-based experiments • Problems faced in other sciences years ago • Issues of scale in every direction • What is representative? • HUGE, HIGH-DIMENSION date sets make things break • Disconnect between measurements for operations and measurements for research • Operational interests: SLA’s, billing, privacy, … • Research interests: network-wide properties wail.cs.wisc.edu
Current measurement trends • Open end host network measurement infrastructures • Available for a variety of uses • Large public data repositories • Domain specific • Suitable for longitudinal studies • Network telescope monitors • Malicious traffic • Laboratory-based testbeds • Bench environments • Standard anonymization methods • Address privacy concerns wail.cs.wisc.edu
End host infrastructures • Paxson’s NPD study; an end-host prototype • Accounts on 35 systems distributed throughout the Internet • Active, end-to-end measurement focus • National Internet Measurement Infrastructure (NIMI) and others evolved from NPD • Perhaps a bit too ambitious at the time • Today’s end host infrastructure “success story”: Planetlab wail.cs.wisc.edu
PlanetLab overview • Collaboration between Intel, Princeton, Berkeley, Washington, others starting in early ‘02 • Began as a distributed, virtualized system project • Peer-to-peer overlay systems were getting hot • Applications BOF at SIGCOMM ‘02 had only 6/80 people • Systems were donated to an initial set of sites in ‘02 • Most major universities and Abilene POPs • Available to members who host systems • Developers have done a fine job creating a management environment • Isolates individual experiments from each other wail.cs.wisc.edu
PlanetLab sites 449 nodes at 209 sites: source www.planet-lab.org wail.cs.wisc.edu
End host infrastructures & SP • End host infrastructures are primarily for active measurement • Generate probes and measure responses • Problem domains • Network structure via tomography • Network distance estimation • End-to-end resource estimation • End-to-end packet dynamics wail.cs.wisc.edu
Large public data repositories • First data repository - Internet Traffic Archive (LBL) • Hodgepodge of traces from various projects • Current projects are more focused • Passive Measurement and Analysis Project • Packet traces from high performance monitors • Abilene Observatory • Flow traces from the Internet2 backbone routers • Route views/RIPE • BGP routing updates from ~150 networks • Datasets for network security • DHS project focused on making attack and intrusion data available for research wail.cs.wisc.edu
Data repositories & SP • Most of the data in aforementioned repositories was gathered via passive means • Counters/logs on devices • Installed instrumentation • Configuration to measure specific traffic (BGP) • Problem domains • Anomaly detection • Traffic dynamics • Routing dynamics wail.cs.wisc.edu
Network telescopes • Simple observation 1: number globally routed IP addresses <> number of live hosts • Network address translation • Networks (ranges of IP addresses) are routed • Simple observation 2: traffic to/from standard services should only arrive at live hosts • Misconfigurations and malicious traffic are the exceptions • Network telescope = traffic monitor on routed but otherwise unused IP addresses • This traffic is otherwise usually dropped at border router wail.cs.wisc.edu
So, what’s the point? • Bad guys don’t know which IP addresses in a network a live • Random and systematic scanning commonly used • Spoofed source addresses are used in DoS attacks • Misconfigurations are fairly rare • Ergo, network telescopes can provide important perspective on malicious traffic • Most importantly, a clean signal • Implementation is fairly simple • Honeypots of live systems or honeypot specific monitors wail.cs.wisc.edu
What do we see? • “Characteristics of Internet Background Radiation,” Yegneswaran et al., ‘04. • Active monitors (small, medium, large) at 3 sites • Traffic is dominated by activity on common services • Worms and probes targeting HTTP and NetBIOS • The focus of our study • Traffic is highly variable and diverse • Perspectives from 3 monitors are quite different • Traffic mutates rapidly • Much deeper analysis is necessary wail.cs.wisc.edu
Network telescopes & SP • An emerging, rich source of data • Network security is critically important • Problem domains • Outbreak and attack detection • Collaborative monitoring • Dynamic quarantine • (Misconfiguration analysis) wail.cs.wisc.edu
Laboratory-based testbeds • Most scientific disciplines commonly use bench environments to conduct research • Control • Instrumentation • Repeatability • Network research community has relied on analytic modeling, simulation and empirical measurement • Openly available bench environments for network research are emerging • EMULAB at Utah - collection of end hosts • Wisconsin Advanced Internet Lab - collection of routers and end hosts wail.cs.wisc.edu
Laboratory testbeds & SP • The effectiveness of bench research hinges on research hypothesis and experimental design • Aspects of scale (emergent behavior) are difficult to capture • Problem domains • Inference tool analysis • Protocol (implementation) analysis • Anomaly detection • Network system evaluation wail.cs.wisc.edu
Data anonymization • Lots of people measure, most are scared s*!#less about sharing data • This is a legal issue • No standards (sure payloads are off limits, but addresses?) • Don’t ask, don’t tell? • The community is developing tools for trace anonymization • “A High-Level Programming Environment for Packet Trace Anonymization and Transformation,” Pang et al., ‘03. • Prefix preserving address anonymization • Payload hashing • Probably no direct SP application • But, implications vis-à-vis future data availability wail.cs.wisc.edu
Summary • Trends over past 30 years • Divergence of research and operations • Decline of importance of measurement in research • Empirical study of the Internet as an artifact • Current trends • Rise of measurement as a discipline • Open infrastructures and network testbeds • Large-scale domain specific data repositories • Novel measurement methods • Future trends • ?? • Embedded measurement systems wail.cs.wisc.edu