340 likes | 504 Views
AS Relationship Inference. AS Graph and AS Relationship Inference Gao’s Degree-based Heuristics SARK’s “Multiple Vantage Point” Approach Computation/Optimization based Approach Summary and Discussion. AS Relationship Inference. AS graph as a (simple) model for Internet structure
E N D
AS Relationship Inference • AS Graph and AS Relationship Inference • Gao’s Degree-based Heuristics • SARK’s “Multiple Vantage Point” Approach • Computation/Optimization based Approach • Summary and Discussion CSci8211: AS Relation Inference
AS Relationship Inference • AS graph as a (simple) model for Internet structure • nodes: ASes; edges: BGP connections between ASes • not the same as “physical” topology • “connectivity” in AS graph does not mean “reachability” • BGP policy based • Need to augment “edges” in AS graph with types of relationships • AS relationship inference problem Relationship? AS 1 AS 2 CSci8211: AS Relationship Inference
Applications of AS Relationships Some examples • Construct Internet distance map • Place proxy or mirror site servers • Potentially avoid route divergence • Help ISPs or domain administrators to achieve load balancing and congestion avoidance • Help ISPs or companies to plan for future contractual agreements • Help ISPs to reduce effect of misconfiguration and to debug router configuration files
Data Sources • How to obtain an Internet graph at AS level? • BGP routing tables (AS PATH) • Active probes (traceroute) • BGP routing tables (BGP views) • Route Views Project (www.routeviews.org/) • RIPE (www.ripe.net/) • UMN (www.cs.umn.edu/research/networking/BGP/traces/) • …. • Traceroute data (need to map IP’s to ASN’s) • CAIDA • Router servers via telnet, (www.traceroute.org/#Route%20Servers) • iPlane • ….
Caveats Challenge: Can we get a complete Internet AS graph? If not, why? • Impact of partial BGP views • Where the vantage points? • What are likely missed? • Impact of (partial) traceroute data • … • Beware of sampling bias
AS Relationship Inference Problem Basic Assumptions • Most common AS relationships • provider-customer • peer-to-peer • (some may be sibling-sibling) • Common BGP routing practices • Prefer customer route over peer/provider routes • Prefer peer router over provider routes • Filter peer routes to providers • Filter provider routers to peers • (most) AS paths of an BGP routing table entry are valley-free
7 4 3 2 6 5 1 1 3 4 5 6 7 1 2 Valley-Free Property An AS path (u1, u2, …, un) is valley-free if and only if p-c or p-p edges can be followed by only p-c or s-s edges. possibly missing possibly missing possibly missing possibly missing P-C C-P P-P
Abstract Model: ToR Graph • Given a graph G=(V,E) • Edges are either “directed” or “undirected” (but their orientations unknow) • directed edge <u,v>: u, customer; v, provider • undirected edge (u,v): u and v are peers • We are given a set of paths P (in G) • AS relationship inference Problem: • ToR Edge Orientation Problem: orient (some) edges in G so as to minimize # of paths in P that are “invalid”, i.e., non-valley-free
AS1 AS7 AS6 AS2 AS3 customer-to-provider edge AS4 AS5 peer-peer edge sibling-sibling edge AS ToR Graph
A “Simplified” Problem • For any “valley-free” path p=(u1, …,um) with a “flat top” (u1,ui+1), i.e., undirected edge, we can orient this edge any way, it is still a “valley-free” path! • Given any solution to the ToR edge orientation problem with some undirected edges, we can orient these edges without increasing # of invalid paths in P! • A simpler version of the Problem: • ToR-Simpe:orient all edges in G so as to minimize # of paths in P that are “invalid”, i.e., non-valley-free • A general two-step process: • solve ToR-simple first (i.e., customer-provider edges); • then figure out “peer-to-peer” edges • point to the difficulty of inferring “peer-to-peer” edges • some “arbitrariness” involved
Gao’s Algorithm • A degree-based approach • Intuitively, ASes with high degrees (# of AS neighbors) are likely providers • Two-step process • First, try to orient all edges to minimize invalid paths • Intuition for step 1: • given a path path p=(u1, …,um), pick uk, where degree(uk) is largest among all ui, make uk as top provider on the path • there may be many paths involving u_k • need to figure which paths and which nodes to start with • Then pick some “custumer-provider” edges to “flatten” them out as “peer-to-peer” edges • Heuristics: nodes with similar degrees likely peers • Full-mesh at the top of the hierachy • Also address “sibling” edges
Basic Algorithm • Heuristics: • Top provider has largest degree • Based on patterns on BGP routing table entries • Consecutive AS pairs on the left of top provider are customer-to-provider or sibling-sibling edges • Consecutive AS pairs on the right of top provider are provider-to-customer or sibling-sibling edges
Basic Algorithm ... • Computation complexity • O(N): Total number of consecutive AS pairs in the routing table • Problem • BGP Mis-configuration: some BGP speaking routers do not conform to the selective export rule • Example: u, v are providers of w. w announce w-v to u, and we get (u, w, v). Suppose d(v) is the max, we will infer Edge[u,w] = p2c • Consequence: incorrect inference of AS relationships • Solution: Refined algorithm
Refined Algorithm • 1. Compute the degree for each AS • Degree[u] = |neighbor[u]| • 2. Count # of paths that imply an AS pair having customer-provider (transit) relationships • e.g. 217 57 11537 10466 55; 217 57 3908 19092 209) • For each AS path, (u1, u2, ..un), find j such that degree[uj] is the maximum • For i = 1 to j – 1 transit[ui, ui+1] = transit[ui, ui+1] + 1 • For i = j to n – 1 transit[ui+1, ui] = transit[ui+1, ui] + 1
Refined Algorithm (cont.) • 3. Assign relationships to AS pairs • For each AS path (u1, u2, ..un) • For I = 1, …, n-1 • If (transit[ui, ui+1] > L and transit[ui+1, ui] > L) or ( (transit [ui, ui+1] <= L and transit[ui+1, ui] > 0) and (transit[ui+1, ui] <= L and transit [ui, ui+1] >0) ) • Edge[ui, ui+1] = sibling-to-sibling • Else if transit[ui+1, ui] > L or transit [ui, ui+1] = 0 • Edge[ui, ui+1] = provider-to-customer • Else if transit[ui, ui+1] > L or transit [ui+1, ui] = 0 • Edge[ui, ui+1] = customer-to-provider L: a small constant
Final Algorithm • Phase 1: use either basic or refined algorithms to coarsely classify AS pairs into provider-customer or sibling relationships • Phase 2: Identify AS pairs that can not have a peering relationship • For each AS path (u1, u2, ..un) • find the AS uj such that degree[uj] is max1<=i<=ndegree[ui] • for i = 1, …, j-2 • notpeering[ui, ui+1] = 1 • for i = j+1, …, n-1 • notpeering[ui, ui+1] = 1 • if edge[ui-1, ui] <> sibling-to-sibling and edge[ui, ui+1] <> sibling-to-sibling • If degree[ui-1] > degree[ui+1] • notpeering[ui, ui+1] = 1 • else • notpeering[ui-1, ui] = 1
Final Algorithm (cont.) • Phase 3: Assign peering relationship to AS pairs • For each AS path (u1, u2, ..un) • For j = 1, … n-1 • If notpeering[ui, ui+1] <> 1 and notpeering[ui+1, ui] <> 1 and degree[ui] / degree[ui+1] < R and degree[ui] / degree [ui+1] > 1 /R edge[ui, ui+1] = peer-to-peer R: a sensitive constant, very difficult to properly set
Relationships # AS pairs Percentage P-C / C-P 12930 93.7% P-P 713 5.7% S-S 157 1.6% Inference Results 13, 800 AS pairs (2000/3/9) [R = 60, L = 1]
Verification of Inferred Relationships by AT&T 8 Comparing inference results from Basic and Final(R= ) with AT&T internal information
Issues with AS Degree based Approach • Challenges • Can’t obtain accurate degree for all ASes from a single BGP view • Assumptions may not always hold • (1) top provider has the highest degree, and • (2) highest degree ASes’ peers have higher degree than their customers • Impact of BGP configuration errors • Router configuration typo. (e.g., 7018 3561 7057 7075 7057) • Mis-configuration of small ISPs, e.g., (1239 11 116 701 7018) • Unusual AS relationships • (1239 3561 2856 701 702 1849 9090)
Multiple Vantage Points Approach Main Idea of SARK approach: • exploit the structure of partial views of AS graph as seen from multiple vantage points • assign a rank to each AS for each of the partial views • infer the relationships between neighboring ASes by comparing their vectors of ranks
Computing AS Ranks • Each BGP vantage point has a partial view (sub-graph) of global AS graph • A tree (or DAG) rooted at vantage • In general, “leaves” of trees are likely customer ASes • Combine all views together to rank each AS • map each AS into an N-dimensional vector (ri1, ri2, … riN) • rij is the rank of AS i from vantage point j, • AS far away (i.e., w/ smaller ranks) from most vantage points are likely customers of other Ases • A reverse pruning algorithm to compute rank from each vantage point
Reverse Pruning Algorithm • Let X denote the source AS of a particular view of the AS graph • Let P(X) denote the set of AS paths seen from X • Let v(Gx) denote the set of all vertices in Gx from P(X) G = Gx r = 1 While (leaves(G) != NULL) { For all u leaves(G) Rank(u) = r; v’ = v(G) – leaves(G); r = r + 1; G = Gv’ } For all u v(G) set rank(u) = r;
Inferring AS Relationships • N vantage points • l(i, j): number of coordinates k where rik > rjk • e(i,j): number of coordinates k where rik = rjk • For an adjacent AS pairs: (i,j) • Provider-customer: AS i is a provider of AS j if l(i, j) >= N/2 and l(j, i) = 0 (i “dominates” j) • Peer-to-peer: AS i and AS j are peers if e(i, j) > N/2 (i is “equivalent” to j)
“Probabilistic” Version • Probabilistic Dominance: • if l(i, j)/l(j,i) > 0for a high value of 0 then i probably dominates j, and thus i is a provider of j • AS i is (more likely) provider of AS j • Probabilistic Equivalence: t • two ASes are probably equivalent if 1/1 <= 1(i,j)/l(j,i) > 1 for a 1 close to 1 • This rule is used to infer peering relationships between ASes when visibility is poor across the partial views • N = 10, = 2 in the experiments • No sibling edges inferred
“Hardness” of ToR Inference Problem • ToR edge orientation problem is NP-complete! • Original formulation is a minimization problem • ToR-D problem – a decision problem: • Given a graph G, a set of paths P, and an integer k, test if it is possible to give an orientation to some of the edges of G so that the number of invalid paths is at most k • ToR-D-Simple problem: • Given a graph G, a set of paths P, and an integer k, test if it is possible to give an orientation to all edges of G so that the number of invalid paths is at most k • ToR-D-problem admits a solution iff ToR-D-simple admits one • .
How Hard is ToR-D Simple? • We can map ToR-D simple to MAX2SAT • MAX2SAT is a NP-complet problem • However, when k=0, ToR-D simple is equivalent to 2SAT, which is solvable in linear time • finding strongly connected components in a directed graph G2SAT , and verify they contain no cycles • Heuristics for solving ToR-simple problem • find the maximum subset of paths that can all be made valid by removing paths involved in “cycles” in the directed graph G2SAT
Solving 2SAT Problem • All clauses can be satisfied (all paths can be made valid) if there is no variable xi belonging with its negation to the same SCC in G2SAT (conflict variable/edge) • SCC (strongly connected component) is a set of mutually reachable nodes in a directed graph • Proper direction of non-conflict edges can be done via topological sorting in G2SAT (if the variable negation is before the variable itself, then the variable is true, and vice versa) • Topological sorting is a natural ordering of nodes in directed acyclic graphs
Summary • All three approaches have their own advantages and disadvantages • Applying all these approaches will get very different AS relationships • Example: running Gao and SARK’s algorithms on the same dataset (2003/01/09). (www.cs.berkeley.edu/~sagarwal/research/BGP-hierarchy) • Results Gao SARK Common • P-C 29446 29320 27852 • P-P 1015 1495 262 • S-S 339 Not considered N/A • Inferring peering relationships is a hard problem! • There are some further improvements upon these algorithms
Case 1: some edges can be directed any way without causing invalid paths Fix: introduce additional incentive to direct edge along the node degree gradient Case 2: trying to infer sibling links leads to proliferation of error Fix: try to discover sibling links using the WHOIS database 701 617 618 sibling Causes of some problemsand possible resolutions 701 ? cust-prov 616 1 2 8043
Discussion • No perfect solution yet (or is it ever one?) • It is unlikely to obtain a complete global Internet topology. (BGP routing tables) • How to get more satisfying results with such partial dataset? • BGP mis-configuration makes inferring complicated. • Inferring p-p is much more difficult than inferring p-c. • Given two interconnected ASes, can we develop some “AS attributes” to distinguish them? • How to validate AS relationships without internal information.
Main Reference • L. Gao, “On Inferring Autonomous Systems Relationships in the Internet,” IEEE/ACM Tran. Networking, Dec. 2001. • L. Subramanian et al., “Characterizing the Internet Hierarchy from Multiple Vantage Points,” INFOCOM Jun. 2002. • G. Di Battista et al., “Computing the Types of the Relationships between Autonomous Systems,” INFOCOM Apr. 2003.