340 likes | 471 Views
New Directions in Internet Topology Measurement and Modeling. John Byers Topology Modeling Group, Boston University CS: Mark Crovella , Marwan Fayed, Anukool Lakhina , Ibrahim Matta, Alberto Medina Physics: Paul Krapivsky , Sid Redner Statistics: Eric Kolaczyk.
E N D
New Directions in Internet Topology Measurement and Modeling John Byers Topology Modeling Group, Boston University CS: Mark Crovella, Marwan Fayed, Anukool Lakhina, Ibrahim Matta, Alberto Medina Physics: Paul Krapivsky, Sid Redner Statistics: Eric Kolaczyk
Some observations about the Internet • Rapid, decentralized growth: • 90% of Internet systems were added in the last four years • Connecting to the network can be a purely local operation • This rapid, decentralized growth has opened significant questions about the physical structure of the network; e.g., • The number of hosts connected to the network • The properties of network links (delay, bandwidth) • The interconnection pattern of hosts and routers • The interconnection relationships of ISPs • The geographic locations of hosts, routers and links
Approaching the Internet Scientifically • Engineering or Science? Engineering: study of things made Science: study of things found • Although the Internet is an engineered artifact, it now presents us with questions that are better approached from a scientific posture. • Worthwhile scientific goals • Understand what drives Internet growth • Basic investigations pay off in unexpected ways
Talk Organization • A brief retrospective. • Towards a scientific understanding of the Internet. • Specific directions: • Geometry/geography-driven topology generation. • Where are the nodes/links in the Internet (recap) • What is the geographical extent of ASes? • Measuring and modelling the time-evolution of AS sizes.
Case Study: “Origins” paper, [MMB ‘00] • Goal: Causal explanation for then unexplained power-laws in [FFF ‘99]. • Our hypothesis: Simple Barabasi-Albert model of incremental growth, preferential attachment. • Model led to topologies which fit known metrics… • BUT • How to validate this explanation? • [FFF ‘99] snapshots inadequate for testing hypotheses about time-evolution of the system. • In fact, no adequate set of measurements available. • Also, much easier to invalidate than validate.
Case Study: “Origins” paper, [MMB ‘00] • Not an entirely wasted effort... • Some positive outcomes: • BRITE/BRIANA topology generation framework / analysis engine for testing wide assortment of models. • Motivation to focus on modeling problems where validation was possible… • … or better yet, to start from measurements themselves. • And some future considerations: • Do topology models need to be explanatory, or just descriptive? • How to place value on a model that cannot be validated?
Our current approach • Measurement: • understand topological features from direct study • leverage measurements from others as possible • measurements of time-evolution of a system are especially helpful • Characterization and Modeling • Validation: provide empirical confirmation that model predictions fit measured data • Tool-Building: build our models / other models into BRITE (open source).
Breathing life into topology generation • Raw topology analogous to a skeleton • presents coarse structure, but incomplete, inanimate • inadequate for conducting most simulations • Flesh out by building annotated graphs: • Label nodes with autonomous system (AS) ID’s. • Label edges with link bandwidths. • Label edges with latencies. • Do this in a representative manner. • Animate the topology: • Generate representative traffic workloads across the annotated graph. • Consider other dynamic factors (churn, link failures) • Now we’re ready to conduct a simulation.
One primary direction: Geography • Long-term goal: annotated graph generation: • Label nodes with autonomous system (AS) ID’s. • Label edges with link bandwidths and latencies. • How? • All of the problems seem more tractable when we consider the underlying geometry of the network. • But next to nothing is known about the geometry/ geography of today’s Internet. • Geographic extent of ASes? • Distribution of link lengths? Inter-AS link lengths? • Our first step (now complete): measurements.
Where is the Internet? ? ? ? ? ?
Assumptions and Definitions • We treat the Internet as an undirected graph embedded on the Earth’s surface • Nodes correspond to routers or interfaces • Edges correspond to physical router-router links • Routers associated with an administering AS • Not concerned with hosts (end systems) • We will ignore many higher and lower level questions
Our Basic Approach • Obtain IP-level router maps • Mercator and Skitter • Find geographic location of each router • Ixia’s IxMapping; Akamai’s EdgeScape • Identify AS associated with each router • RouteViews
Mercator:Govindan et al., USC/ISI, ICSI • Based on active probing from a single site • Resolves aliases • Uses loose source routing to explore alternate paths Skitter:Moore et al., CAIDA • Traceroutes from 19 monitors to large set of destinations • Does not resolve aliases • Destinations attempt to cover IP address space
Datasets • Mercator • Collected August 1999 • 228,263 routers • 320,149 links • Skitter • Collected January 2002 • 704,107 interfaces • 1,075,454 links
IxMapping:Moore et al., CAIDA • Given an IP address, infers geographic location based on a variety of heuristics • Hostnames, DNS LOC, whois e.g., 190.ATM8-0-0.GW3.BOS1.ALTER.NET is in Boston • Able to map over 98% of routers/interfaces • Similar to GeoTrack [Padmanabhan] which exhibits reasonable accuracy • Median error of 64 mi • 90% queries within 250 mi
EdgeScape:Akamai • Given an IP address, returns lat./long. • Methods employed not currently published; available as a commercial service. • Claims mean error of < 50 miles • Able to map over 99% of routers/interfaces
RouteViews • Provides daily BGP table snapshots • For each of the router/interface inventories, we pull a BGP snapshot from the same date. • Then, for each interface, infer the associated AS by the AS advertising the containing block. • For routers with multiple interfaces, use the majority vote; discard if there is no majority vote (2% of all routers).
Interfaces and People: USA, Skitter Grid size: ~90 mi x 90 mi
Routers and PeopleUpper, Mercator; Lower, Skitter USA Europe Japan
Router Location: Summary • Router location is strongly driven by population density • Superlinear relationship between router and population density: R k Pa k varies with economic development (users online) a is greater than one (1.2 - 1.7) • More routers per person in more densely populated areas
Link Preference Function • Interested in influence of distance on link formation: f(d) = P[C|d] i.e., Probability two nodes separated by distance d are directly connected by a link. • Estimated as: number links of length d f(d) = ------------------------------------------- number of router pairs separated by d
Distance Sensitive Distance Insensitive f(d) for USA (Skitter)
Link Distance Preference for USA Skitter, d < 250, semi-log plot L 140 mi.
d F(d) = f(u) u=1 Large d: distance insensitivity USA data, Skitter
Link Formation: Summary • Link formation seems to be a mixture of distance-dependent and –independent processes • Waxman (exponential) model remarkably good for large fraction of all links! • But, crucial difference is that we are using a very irregular spatial distribution of nodes • Small fraction of non-local links are very important (structural)
Where are the ASes? • Two measures: • # of distinct locations (grid cells) • area of the convex hull of the set of distinct locations • Computing convex hull • cut earth along Int’l Date Line and unroll • use Albers equal area projection to approximately preserve areas
AS Findings (1) • [TDGJSW ‘01] Size of an AS (in routers) and AS degree are well correlated. • We find 3-way correlation between size, degree and # of distinct locations. • Distribution of number of locations is long-tailed, highly variable.
AS Findings (2) • 80% of ASes have 0 area -- two or fewer distinct locations. • The rest of the ASes fall in two regimes: • small ASes have considerable variability • largest ASes are fully dispersed • cutoff: degree > 100 or interfaces > 1000. • size, degree and # of distinct locations.
Direction 2: AS Size Distribution • Goal: Model the growth and evolution of ASes and their sizes. • Bonus: RouteViews BGP logs may later help validate model predictions. • Hosts enter the system and either: • Create a new AS or • Join an existing AS • At each timestep, a pair of ASes may also merge.
The Simplest Plausible Model • N(t) = number of Ases • M(t) = number of hosts dN/dt = (q-r)N dM/dt = pM + qN • where q is the rate of new AS creation • r is the rate of AS coalescence • p is the rate of creation of new nodes • Relative values of p, q and r determine average AS size. • When p > q - r, average AS size grows as N^((p-q+r)/(q-r)).
Preliminary Findings • Model behavior tractable to analyze. • AS births, deaths and mergers can be identified with some degree of confidence from RouteViews logs. • But… differentiating BGP churn from bona fide events can be challenging • Statistical de-noising methods may apply (?) • Simple model makes reasonable predictions • But… coalescence kernel needs fine-tuning, i.e. measurements indicate that r is not size-indep.
In Conclusion • Generating test networks rather than test topologies is a natural next step. • Geometry/geography provides leverage. • Plenty of unexplored territory. • Validation and measurement continues to be underappreciated. • Measurements of time-evolving systems are in particularly short supply. • Modeling problems can be a bonanza for statisticians and physicists.