700 likes | 710 Views
Explore the physical properties and components of network infrastructure, measurement challenges, and tools used for active and passive measurements. Dive into the intricacies of network topology, traffic properties, and tools classification.
E N D
Infrastructure adapted from Mark Crovella and Balachander Krishnamurthy
Properties to Measure • Physical Properties • Links • wired, wireless • radio spectrum • propagation delay, capacity, packet delay, packet loss, jitter • Devices • NAT boxes • Firewalls • Switches • Routers • FIFO queue, active queue management • network measurements typically try to elicit responses from routers • internal mechanisms affect responses • IP aliases, geo-location
Properties to Measure • Topology Properties • Autonomous System (AS) • Point of Presence (PoP) • Router • Interface
Properties to Measure • Traffic Properties • Delays • Transmission (s/t) • Propagation (d/v) • Routing (Queuing + Processing + Other) • Losses • ln = Ln / Cn • Throughput • (Cn-Ln)/T • Jitter • depends on inter-arrival time • End-to-end connections • goodput
Challenges “Poor Observability” • Reasons for this: • Core simplicity • Layered architecture • Hidden pieces • Administrative barriers
Core Simplicity • Keep It Simple Stupid (KISS) design principle • End-to-End argument • Stateless nature w.r.t connections/flows • no support for measurement • SNMP: considerable overhead even with per-packet/byte counters • As network elements do not track packets individually, interaction of traffic with the network is hard to observe
Layered Architecture • IP hourglass model hides details of lower level layers • While this provides abstraction improving interoperability, it impedes detailed visibility of lower layers • Even detailed measurements such as packet capture cannot detect differences between link types
Hidden Pieces - Middleboxes • Firewalls – provide security • Traffic Shapers – assist in traffic management • Proxies – improve performance • NAT boxes – utilize IP address space efficiently Each of these impedes visibility of network components. • firewalls may block active probing requests • NATs hide away the number of hosts and the structure of the network on the other side
Administrative Barriers • ISPs actively seek to hide details from outside discovery owing to the competition-sensitive nature of the data required • topology, traffic etc. • Information that they do provide are often simplified • No cross ISP SNMP • Instead of publishing router-level topologies, ISPs often publish PoP-level topologies
Tools Classification • Active Measurement • Passive Measurement • Fused/Combined Measurement • Bandwidth Measurement • Latency Measurement • Geolocation • Others
Active Measurement Tools • Methods that involve adding traffic to the network for the purposes of measurement Ping: Sends ICMP ECHO_REQUEST and captures ECHO_REPLY • Useful for measuring RTTs • Only sender needs to be under experiment control OWAMP: A daemon running on the target which listens for and records probe packets sent by the sender • Useful for measuring one-way delay • Requires both sender and receiver to be under experiment control • Requires synchronized clocks or a method to remove clock offset
Traceroute • Useful for determining path from a source to a destination • Uses the TTL (Time To Live) field in the IP header in a clever but distorted way • Large scale measurement systems use traceroute to discover network topology
IP protocol version number 32 bits total datagram length (bytes) type of service head. len header length (bytes) ver length for fragmentation/ reassembly fragment offset “type” of data flgs 16-bit identifier max number remaining hops (decremented at each router) time to live upper layer Internet checksum 32 bit source IP address 32 bit destination IP address upper layer protocol to deliver payload to E.g. timestamp, record route taken, specify list of routers to visit. Options (if any) data (variable length, typically a TCP or UDP segment) IP Header and the TTL field
TTL normal usage • TTL is initialized by the sender and decremented by one each time the packet passes through a router • If it reaches zero before reaching the destination, IP protocol requires that the packet be discarded and an error message be sent back to the sender • Error message is an ICMP “time exceeded” packet
Traceroute • Probe packets are carefully constructed to elicit intended response from a probe destination • traceroute probes all nodes on a path towards a given destination • TTL-scoped probes obtain ICMP error messages from routers on the path • ICMP messages includes the IP address of intermediate routers as its source • Merging end-to-end path traces yields the network map IPB IPA IPC IPD Vantage Point Destination TTL=1 TTL=4 TTL=2 TTL=3 A B C D S
Traceroute Problem • Suppose the path between A and D is to be determined using traceroute X Y D A B C
Traceroute Process X Y D A B: “time exceeded” Dest = D TTL = 1 B C
Traceroute Process X Y D A C: “time exceeded” Dest = D TTL = 2 B C
Traceroute Process X Y D A D: “echo reply” Dest = D TTL = 3 B C
Traceroute issues • Path Asymmetry • Destination -> Source need not retrace Source -> Destination • Unstable Paths and False Edges • Aliases • Measurement Load
Unstable Paths and False Edges Inferred path: A -> B -> Y Y: “time exceeded” Dest = D TTL = 2 X Y D A B: “time exceeded” Dest = D TTL = 1 B C
Probing • Direct probing • Indirect probing IPB IPD Vantage Point IPB TTL=64 IPD TTL=64 B C D A IPB IPC Vantage Point B C D IPD TTL=1 IPD TTL=2 A
Aliases • IP addresses are for interfaces and not routers • Routers typically have many interfaces, each with its own IP address • IP addresses of all the router interfaces are aliases • Traceroute results require resolution of aliases if they are to be used for topology building
Measurement Load • Traceroute inserts considerable load on network links if attempting a large-scale topology discovery • Optimizations reduce this load considerably • If single source is used, instead of going from source to destination, a better approach is to retrace from destination to source • If multiple sources and multiple destinations are used, sharing information among these would bring down load considerably
Other Methods • Multicast • Replied by routers along the path • Synchronize measurements • Packet loss • Queue size
System Support • Efficient packet injection and accurate measurement of arrival and departure times are best done at kernel level • Unrestricted access to the network interface raises security concerns • Using Scriptroute, unprivileged users can inject and capture packets • Periscope’s API helps define new probing structures and inference techniques for extracting results from arrival patterns of responses
Passive Measurement • Methods that capture traffic generated by other users and applications to build the topology • Routeview repository collects BGP views (routing tables) from a large set of ASes • Similarly, OSPF LSAs can be captured and processed to generate router graphs within an AS
Passive Measurement: Advantages and Disadvantages • Large set of AS-AS, router-router connections can be learned by simply processing captured tables • However, especially using BGP views, there could be potential loss of cross-connections between ASes which are along the path • Secondly, route aggregation and filtering tends to hide some connections • Also, multiple connections between ASes will be shown as a single connection in the graph
Challenges • Infrastructural Issues • Sampling • Vantage Points and Destination List • Probing Overhead • Inter- and Intra-monitor Redundancy • Responsiveness of Routers • ICMP, UDP, TCP • Load Balancing Routers • Per destination, per flow, per packet Intra-monitor Inter-monitor
Topology Collection f Internet2 backbone e S N C W U K L A H • Traces • d - H - L - S - e • d - H - A - W - N - f • e - S - L - H - d • e - S - U - K - C - N - f • f - N - C - K- H - d • f - N - C - K - U - S - e d
Topology Sampling: Issues • Sampling to discover networks • Infer characteristics of the topology • Different studies considered • Effect of sample size [Barford 01] • Sampling bias [Lakhina 03] • Path accuracy [Augustin 06] • Sampling approach [Gunes 07] • Utilized protocol [Gunes 08] • ICMP echo request • TCP syn • UDP port unreachable • ~ 10% of routers are unresponsive
y y S S L 1 2 H H x x Unresponsive Routers • Unresponsive routers do not respond to traceroute probes and appear as in traceroute output • Same router may appear as in multiple traces. y y: S – L – H – x y: S – – H – x Current daily raw topology data sets include • ~ 20 million path traces with • ~ 20 million occurrences of s along with • ~ 500K public IP addresses The raw topology data is far from representing the underlying sampled network topology S L H x: H – L – S – y x: H – – S – y x
Unresponsive Router Resolution f Internet2 backbone e S N C W U K L A H • Traces • d - - L - S - e • d - - A - W - - f • e - S - L - - d • e - S - U - - C - - f • f - - C - - - d • f - - C - - U - S - e d
Unresponsive Router Resolution S U K C N f L H A W e • Traces • d - - L - S - e • d - - A - W - - f • e - S - L - - d • e - S - U - - C - - f • f - - C - - - d • f - - C - - U - S - e d Sampled network C U S f L W A e d Resulting network
U K N S C z y W H A L x Sampled network C z U S W y L A Resulting network x C C U z U S z S L W L H y y W A A After resolution After resolution x x Previous Approaches • Basic heuristics • IP: Combine anonymous nodes between same known nodes [Bilir 05] • Limited resolution • NM: Combine all anonymous neighbors of a known node [Jin 06] • High false positives
Previous Approaches • More theoretic approaches • Graph minimization [Yao 03] • Combine s as long as they do not violate two accuracy conditions: • (1) Trace preservation condition and (2) distance preservation condition • High complexity O(n5) – n is number of s • ISOMAP based dimensionality reduction [Jin 06] • Build an nxn distance matrix then use ISOMAP to reduce it to a nx5 matrixDistance: (1) hop count or (2) link delay • High complexity O(n3) – n is number of nodes • Semisupervised Spectral Clustering [Shavitt 08] • A node will not be chosen to be an unknown root if it shares two or more neighbors with an unknown root. • Nodes that share two or more neighbors are usually very close to each other, and it is difficult to distinguish between them even manually. • After splitting them into unknowns, these nodes will have at least one common unknown node. • This makes the task of cleanly separating the unknowns impossible
Structural Graph Indexing (SGI) • Structural Graph Indexing • A graph data mining technique • Index all pre-defined substructures in a graph data • Use of SGI for anonymous router resolution • Apply SGI to collected path traces • Merge anonymous routers using identified structures • Trace Preservation Condition • Don’t merge anonymous routerswithin the same trace • Subnet distance as tie-breaker
Common Structures due to ARs y1 y1 A C x y2 y3 y3 Parallel -substring A C x y2 D w A A D x x w A D x w E z C y C y C y F v Clique Complete Bipartite Star A D x w A D x w C D E w y z C E y z A x E z E z E z C y F v
Graph Indexing based Resolution Resolution Phase parallel clique bipartite star Indexing Phase parallel star bipartite clique
IP Alias Resolution s.1 f e S s.3 n.2 s.2 n.1 N n.3 c.2 u.1 w.1 w.2 c.1 W C c.3 u.2 w.3 U k.1 c.4 k.2 K u.3 k.3 l.1 a.1 l.2 a.2 L A l.3 a.3 h.2 • Traces • d - h.4 - l.3 - s.2 - e • d - h.4 - a.3 - w.3 - n.3 - f • e - s.1 - l.1 - h.1 - d • e - s.1 - u.1 - k.1 - c.1 - n.1 - f • f - n.2 - c.2 - k.2 - h.2 - d • f - n.2 - c.2 - k.2 - u.2 - s.3 - e H h.3 h.1 h.4 d
IP Alias Resolution S U K C N f Sampled network L H A W e d s.3 u.1 c.1 n.1 k.1 s.1 f e c.2 k.2 u.2 n.2 s.2 n.3 w.3 l.1 a.3 h.2 l.3 h.1 • Traces • d - h.4 - l.3 - s.2 - e • d - h.4 - a.3 - w.3 - n.3 - f • e - s.1 - l.1 - h.1 - d • e - s.1 - u.1 - k.1 - c.1 - n.1 - f • f - n.2 - c.2 - k.2 - h.2 - d • f - n.2 - c.2 - k.2 - u.2 - s.3 - e h.4 Sample map without alias resolution d
Previous Approaches • Source IP Address Based Method [Pansiot 98] • Relies on a particular implementation of ICMP error generation. • IP Identification Based Method (ally) [Spring 03] • Relies on a particular implementation of IP identifier field, • Many routers ignore direct probes. • DNS Based Method [Spring 04] • Relies on similarities in the host name structures sl-bb21-lon-14-0.sprintlink.net sl-bb21-lon-8-0.sprintlink.net • Works when a systematic naming is used. • Record Route Based Method [Sherwood 06] • Depends on router support to IP route record processing B Dest = A A A B B A, ID=100 Dest = A Dest = B B, ID=99 B, ID=103 Dest = B
Analytical Alias Resolution no response UTD 129.110.95.1 no response 129.110.5.1 206.223.141.74 206.223.141.73 206.223.141.69 Aliases 129.110.5.1 - 206.223.141.74 206.223.141.73 - 206.223.141.69 206.223.141.70 - 198.32.8.33 … 206.223.141.70 198.32.8.33 198.32.8.34 198.32.8.65 198.32.8.66 198.32.8.85 198.32.8.84 192.5.89.10 192.5.89.89 192.5.89.9 192.5.89.90 18.168.0.27 18.7.21.1 18.168.0.25 MIT 18.7.21.84
c d a b e f a sample network Analytical & Probe-based Alias Resolution • There is possibility of • incorrect subnet assumption • Two /30 subnets assumed as a /29 • incorrect alignment of path traces • IP4 and IP8 are thought of as aliases • To prevent false positives, some conditions are defined • Trace preservation • Distance preservation (probing component of APAR) • Completeness • Common neighbor IP4 IP7 IP1 IP3 IP2 IP8 IP9
Subnet Resolution • Alias resolution • IP addresses that belong to the same router • Subnet resolution • IP addresses that are connected over the same medium IP2 IP3 IP1 IP4 IP6 IP5 IP1 IP1 IP2 IP3 IP2 IP3
Subnet Inference • Subnet resolution • Identify IP addresses that are connected over the same medium • Improve the quality of resulting topology map A B C D IP1 IP1 IP2 IP3 IP2 IP3 A B C D A B A B C D C D (underlying topology) (observed topology) (inferred topology)