1.12k likes | 1.51k Views
Internet Measurements. CS 401/601 Computer Network Systems Mehmet Gunes. Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The building blocks are engineered and studied in depth
E N D
Internet Measurements CS 401/601 Computer Network Systems Mehmet Gunes
Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The building blocks are engineered and studied in depth Global entity has not been characterized Most real world complex-networks have non-trivial properties. Global properties can not be inferred from local ones Engineered with large technical diversity Range from local campuses to transcontinental backbone providers Internet
Role of Internet Directories and Databases • Address registries • Domain Name System (DNS) • Internet Address and Routing Registries • Internet Assigned Numbers Authority (IANA) • Internet Routing Registry • Clearinghouse for AS number mapping • Regional Internet Registries (RIR)
Internet Measurements • Need for Internet measurements arises due to commercial, social, and technical issues • Realistic simulation environment for developed products • Improve network management • Robustness with respect to failures/attacks • Comprehend spreading of worms/viruses • Know social trends in Internet use • Scientific discovery • Scale-free (power-law), Small-world, Rich-club, Dissasortativity,…
Challenges to measurement “Poor Observability” • Reasons for this: • Core simplicity • Layered architecture • Hidden pieces • Administrative barriers
Internet Measurements are anything but straightforward… • Internet Measurement is key to designing the next generation communication network • Fundamental design principles of the current internet make it harder for measuring various aspects of it • Preliminary research has resulted in a set of basic tools and methods to measure aspects like topology, traffic etc. • There is still a lot of ground to cover in this direction
Properties to Measure • Topology Properties • Autonomous System (AS) • Point of Presence (PoP) • Router • Interface
Longitudinal comparison Sources: 1971 - "Casting the Net", page 64; 1980 - http://mappa.mundi.net/maps/maps_001/ http://personalpages.manchester.ac.uk/staff/m.dodge/cybergeography/atlas/historical.html
Internet Topology CAIDA 2006
Internet Topology Measurement CAIDA 2006
Internet Topology Measurement CAIDA 2006
IPv4 address space (2010) Ant Census Data researchers have been collecting data about the Internet address space ~ 3.5 B IPs ~ 250 M replies browse historical
Active Measurement Tools • Methods that involve adding traffic to the network for the purposes of measurement Ping: Sends ICMP ECHO_REQUEST and captures ECHO_REPLY • Useful for measuring RTTs • Only sender needs to be under experiment control One-Way Active Measurement Protocol (OWAMP): A daemon running on the target which listens for and records probe packets sent by the sender • Useful for measuring one-way delay • Requires both sender and receiver to be under experiment control • Requires synchronized clocks or a method to remove clock offset
Probing • Direct probing • Indirect probing IPD Vantage Point IPD TTL=64 B C D A IPB IPC Vantage Point B C D IPD TTL=1 IPD TTL=2 A
Traceroute • Useful for determining path from a source to a destination • Uses the TTL (Time To Live) field in the IP header in a clever but distorted way • Large scale measurement systems use traceroute to discover network topology
Traceroute • Probe packets are carefully constructed to elicit intended response from a probe destination • traceroute probes all nodes on a path towards a given destination • TTL-scoped probes obtain ICMP error messages from routers on the path • ICMP messages includes the IP address of intermediate routers as its source • Merging end-to-end path traces yields the network map IPB IPA IPC IPD Vantage Point Destination TTL=1 TTL=4 TTL=2 TTL=3 A B C D S
IP protocol version number 32 bits total datagram length (bytes) type of service head. len header length (bytes) ver length for fragmentation/ reassembly fragment offset “type” of data flgs 16-bit identifier max number remaining hops (decremented at each router) time to live upper layer Internet checksum 32 bit source IP address 32 bit destination IP address upper layer protocol to deliver payload to E.g. timestamp, record route taken, specify list of routers to visit. Options (if any) data (variable length, typically a TCP or UDP segment) IP Header and the TTL field
TTL normal usage • TTL is initialized by the sender and decremented by one each time the packet passes through a router • If it reaches zero before reaching the destination, IP protocol requires that the packet be discarded and an error message be sent back to the sender • Error message is an ICMP “time exceeded” packet
Traceroute Problem • Suppose the path between A and D is to be determined using traceroute X Y D A B C
Traceroute Process X Y D A B: “time exceeded” Dest = D TTL = 1 B C
Traceroute Process X Y D A C: “time exceeded” Dest = D TTL = 2 B C
Traceroute Process X Y D A D: “echo reply” Dest = D TTL = 3 B C
Internet Topology Measurement Internet2 backbone S s.3 s.2 s.2 n.1 n.3 n.3 N c.2 w.2 w.1 u.1 c.1 W C c.3 w.3 w.3 u.2 U c.4 k.1 k.2 K u.3 l.1 k.3 Trace to NY a.1 a.2 l.2 L A l.3 l.3 a.3 a.3 h.2 Trace to Seattle H h.3 h.1 h.4 h.4 h.4 d
Internet Topology Measurement s.1 f e S s.3 n.2 s.2 n.1 n.3 N c.2 w.2 w.1 c.1 u.1 W C c.3 w.3 u.2 U c.4 k.1 k.2 K u.3 l.1 k.3 a.1 a.2 l.2 L A l.3 a.3 • Traces • d - H - L - S - e • d - H - A - W - N - f • e - S - L - H - d • e - S - U - K - C - N - f • f - N - C - K- H - d • f - N - C - K - U - S - e h.2 H h.3 h.1 h.4 d
Challenges • Infrastructural Issues • Sampling • Vantage Points and Destination List • Probing Overhead • Inter- and Intra-monitor Redundancy • Responsiveness of Routers • ICMP, UDP, TCP • Load Balancing Routers • Per destination, per flow, per packet
Traceroute issues • Path Asymmetry • Destination -> Source need not retrace Source -> Destination • Unstable Paths and False Edges • Aliases • Measurement Load
Unstable Paths and False Edges Inferred path: A -> B -> Y Y: “time exceeded” Dest = D TTL = 2 X Y D A B: “time exceeded” Dest = D TTL = 1 B C
Topology Sampling: Issues • Sampling to discover networks • Infer characteristics of the topology • Different studies considered • Effect of sample size • Sampling bias • Path accuracy • Sampling approach • Utilized protocol • ICMP echo request • TCP syn • UDP port unreachable • ~ 10% of routers are unresponsive
Measurement Load • Traceroute inserts considerable load on network links if attempting a large-scale topology discovery • Optimizations reduce this load considerably • If single source is used, instead of going from source to destination, a better approach is to retrace from destination to source • If multiple sources and multiple destinations are used, sharing information among these would bring down load considerably
Intra-monitor redundancy Destination 2 Destination 1 Destination 3 Monitor 1
Inter-monitor redundancy Destination 1 Monitor 2 Monitor 1 Monitor 3
y y S S L 1 2 H H x x Unresponsive Routers • Unresponsive routers do not respond to traceroute probes and appear as in traceroute output • Same router may appear as in multiple traces. y y: S – L – H – x y: S – – H – x S L H x: H – L – S – y x: H – – S – y x
Unresponsive Router Resolution f Internet2 backbone e S N C W U K L A H • Traces • d - - L - S - e • d - - A - W - - f • e - S - L - - d • e - S - U - - C - - f • f - - C - - - d • f - - C - - U - S - e d
Common Structures due to ARs y1 y1 A C x y2 y3 y3 Parallel -substring A C x y2 D w A A D x x w A D x w E z C y C y C y F v Clique Complete Bipartite Star A D x w A D x w C D E w y z C E y z A x E z E z E z C y F v
IP Alias Resolution .33 .5 • Each interface of a router has an IP address. • A router may respond with different IP addresses to different queries. • Alias Resolution is the process of grouping the interface IP addresses of each router into a single node. • Inaccuracies in alias resolution may result in a network map that • includes artificial links/nodes • misses existing links .18 Denver .7 .13
IP Alias Resolution s.1 f e S s.3 n.2 s.2 n.1 N n.3 c.2 u.1 w.1 w.2 c.1 W C c.3 u.2 w.3 U k.1 c.4 k.2 K u.3 k.3 l.1 a.1 l.2 a.2 L A l.3 a.3 h.2 • Traces • d - h.4 - l.3 - s.2 - e • d - h.4 - a.3 - w.3 - n.3 - f • e - s.1 - l.1 - h.1 - d • e - s.1 - u.1 - k.1 - c.1 - n.1 - f • f - n.2 - c.2 - k.2 - h.2 - d • f - n.2 - c.2 - k.2 - u.2 - s.3 - e H h.3 h.1 h.4 d
IP Alias Resolution Approaches • Source IP Address Based Method • Relies on a particular implementation of ICMP error generation. • IP Identification Based Method (ally) • Relies on a particular implementation of IP identifier field, • Many routers ignore direct probes. • DNS Based Method • Relies on similarities in the host name structures sl-bb21-lon-14-0.sprintlink.net sl-bb21-lon-8-0.sprintlink.net • Works when a systematic naming is used. • Record Route Based Method • Depends on router support to IP route record processing B Dest = A A A B B A, ID=100 Dest = A Dest = B B, ID=99 B, ID=103 Dest = B
Subnet Inference • Subnet resolution • Identify IP addresses that are connected over the same medium • Improve the quality of resulting topology map A B C D IP1 IP1 IP2 IP3 IP2 IP3 A B C D A B A B C D C D (underlying topology) (observed topology) (inferred topology)
Subnet Inference Approach 129.110.0.0/16 129.110.1.1 129.110.1.2 129.110.2.0 129.110.2.1 129.110.4.1 129.110.4.83 129.110.4.217 129.110.12.1 129.110.12.2 129.110.12.6 129.110.17.1 129.110.17.135 129.110.219.1 2 3 3 4 2 1 2 4 5 5 4 5 3 V.P. 129.110.1.0/31 129.110.219.0/24 /24 129.110.2.0/30 129.110.4.0/24 /24 129.110.12.0/29 129.110.4.0/24 /30 129.110.1.0/30 /29 129.110.2.0/31 /31 129.110.12.0/29 129.110.6.0/28 129.110.17.0/24 129.110.17.0/24 /28 /24 Subnet-level Internet mapping : Subnet Inference
Analytical IP Alias Resolution no response UTD 129.110.95.1 no response 129.110.5.1 206.223.141.74 206.223.141.73 206.223.141.69 Aliases 129.110.5.1 - 206.223.141.74 206.223.141.73 - 206.223.141.69 206.223.141.70 - 198.32.8.33 … 206.223.141.70 198.32.8.33 198.32.8.34 198.32.8.65 198.32.8.66 198.32.8.85 198.32.8.84 192.5.89.10 192.5.89.89 192.5.89.9 192.5.89.90 18.168.0.27 18.7.21.1 18.168.0.25 MIT 18.7.21.84
Geolocation • Given the network address of a target host, what is the host’s geographic location? • The answer to this is useful for a wide variety of social, economic and engineering purposes • The actual location of network infrastructure sheds light on how it relates to population, social organization and economic activity
Geolocation methods • Name Based Geolocation • Extracting location details from ISPs domain names • Location Databases • Delay Based Geolocation • Best Landmark • Constraint-based
Landmark based geolocation • In best landmark approach, minRTT between each of the identified landmarks is measured and stored • Then the same metric is calculated between the node in question and each of the landmarks. • The landmark with the best matching values of minRTT is the closest to the node
Constraint based geolocation • The distances of target location from sufficient number of fixed points are calculated and using multilateration • Used in GPS • However, Internet delay is affected by many factors (i.e., non-linear)
Passive Measurements • Methods that capture traffic generated by other users and applications • Routeview repository collects BGP views (routing tables) from a large set of ASes • Similarly, OSPF LSAs can be captured and processed to generate router graphs within an AS