Knowledge Plane -- Scaling of the WHY App

Knowledge Plane--Scaling of the WHY App Bob Braden, ISI 24 Sept 03 Bob Braden@ISI

Scaling • [How] can we make KP services "scalable" (whatever that means)? • Network traffic • Processing • Storage • E.g., suppose that every end system uses WHY. • This should be a good example of scaling issues in the KP • To diagnose the cause of failure, typically need information that is available only in neighborhood of failure => wide-area problem. Bob Braden@ISI

IP Path Diagnosis • Consider a subset of the WHY problem: diagnosis of an IP data path: • Can S send IP datagram to D, and if not, why not? • For the "cause", simply tell which node or link is broken. • Thus, ask for information currently provided by traceroute. Bob Braden@ISI

A Simple, Analogous Scaling Problem • Let’s think about a connectivity testing tool that runs in the data/control-plane (not the KP): IPdiagnose. • Operates like traceroute, hop/hop along data path. • Returns path vector: list of router hops to failure point. • Want to make IPdiagnose scalable, in case all the users trigger it. • Purpose: • Insight into more general KP scaling • Insight into DDC's model Bob Braden@ISI

Possible Approaches to IPdiagnose • Using vanilla tracerouteOH ~ w Ne l2 l = Path length (number of hops) • = Diagnostic frequency (WHY requests per sec per end node) Ne = number of end nodes that issue traceroutes. • Record-&-Return-Route (RRR) msg, processed in each router.OH ~ w Ne l S,1>D S>D message S,1,2>D S 1 2 3 X D S,1,2,3>D Bob Braden@ISI

Make it Scalable Lower the overhead by decreasing average path length l. • Move (prior) results as close to end points as possible/practicable. This reduces the diagnostic traffic in the center of the network. • To achieve this, use: 3. "Aggregation" • If matching request for same D arrives while previous is pending, hold it and satisfy when reply comes back. 4. Demand-driven result caching • Cache results back along the path from S; • Use cached results to satisfy subsequent requests for same destination that come later. Bob Braden@ISI

Result Caching • Search messages from S gather path vector in forward direction. • Return messages visit each node along return path to S and leave IPdiagnose result state there. S' message S,1>D S,1,2>D S 1 2 3 X D S,12,3>D S,12,3>D S,12,3>D {>D} {3>D} {2,3>D} State retained in node 1: If a later IPdiagnose S'->D reaches this node, return the path {S’,…,1, 2, 3 > D} Bob Braden@ISI

This is not quite certain… • Note: cached path could be unreliable, but it generally works in the absence of policy routing. S,1>D S,1,2>D S 1 2 3 D X 2’ D’ Bob Braden@ISI

More Scalability • Suppose have cached state for failed path S -> D. Does this help for another path S' -> D' that shares 1, 2, 3, …? • Suppose that routing in node 3 supplies an address range Dr that contains address D. • Cached results can contain Dr. • If D’ is contained in Dr, then node 1 can use cached state {2,3>Dr} to infer broken path {1,2,3>D’}. S 1 2 3 X D Dr {3>Dr} {2,3>Dr} D’ S’ Bob Braden@ISI

Flushing the Cache • New requests matching cache inhibit timeout. • Some percent of matching requests will be forwarded anyway, as probe requests. • A node will initiate a reverse message towards all relevant senders to adjust/remove cached state, if: • Routing changes Dr, or • A probe request finds next hop info that differs from cached path. Bob Braden@ISI

Relation to DDC Model in KP • Dest address D is the (only) variable of the “tuple” composing the request. • Forwarding is not offer-based (unless next-hop routing calculation is considered an “offer”) • Does not exactly match DDC's “Aggregation” story (?) • First request arrives: Don’t want to delay it to await a matching request, so cache and forward it. Is this an "aggregation"? • DDC's model does not have result caching. • In KP, must consider complexity caused by regions. • Sparse overlay mesh of TPs Bob Braden@ISI

Other Approaches to IPdiagnose 4. Flooding (unconstrained diffusion) • Every diagnostic event (link-down event) is flooded out to edges, where it matches requests. • I am confused about scalability here. Intuitively this seems unscalable, but I don’t see how to justify that. • Flooding cost ~ O(#links * #faults) (one per fault per link) • Request cost ~ O(w * Ne ) (path length = 1) Bob Braden@ISI

More Approaches to IPdiagnose 5. Directed Diffusion • Link state changes are flooded out towards edges in directions of significant fluxes of incoming WHY requests. • In sparse directions, use RRR messages or result- caching within the network, as discussed earlier. • This is reverse of Clark’s proposal – here the requests are creating a gradient to control the diffusion of satisfactions nearer to the users. Bob Braden@ISI

(The End) Bob Braden@ISI

Demand-Driven Result Caching • Creates a depth-first diffusion of IPdiagnose replies, triggered by requests for the same destination that share part of the same path. • Note that if path is not in fact broken, then nothing is cached and then scaling of IPdiagnose stinks. Bob Braden@ISI

DDC’s Request Satisfaction Model • Route a request hop/hop (roughly) paralleling the data path to reach a Request Satisfier (RS) near failure node F. • Satisfaction: IP path vector from S to F. • Recursive induction step at node K (Assume RS is in each node) : • Request "(IPFAIL, D, (S, N1,…Nn))" arrives at node Nn. • Analysis: • “S cannot send datagrams to D, but packets from S to D reach me.. • The next-hop node towards D from is Nn+1. • I will test whether I can get to Nn+1 and, if so, pass request "(IPFAIL, D, (S, N1, … Nn+1) along to it. • If not, I will return path vector (S, N1, … Nn ) back to S." Bob Braden@ISI

DDC’s Model … • More complex version of model: take into account the region structure of Internet. E.g., RS per region. • Request arrives at RSn; induction step of analysis is: • “Packets from S to D reach my (entry) edge node En.” • “I have evidence that packets are flowing from En to my appropriate (exit) edge node E’n. • The next-hop RS, in the next AS along the data path towards D, is RSn+1 and the next hop towards D from E’n is En+1. • I will test whether I can get to En+1, and if so, pass this request along to RSn+1, else I will return path vector (S, N1, … En, … E’n ) to S.” Bob Braden@ISI

Result Caching … general case Any source S e ||S|| uses the broken link to reach any D e ||D|| INTERNET X ||D|| ||S|| Infer ||D|| from routing, Store information about broken link to ||D|| near every Se ||S|| Bob Braden@ISI

Knowledge Plane -- Scaling of the WHY App

Knowledge Plane -- Scaling of the WHY App

Presentation Transcript

A Knowledge Plane for the Internet

The Coordinate Plane

Parts of the Hand Plane

On the Enhancement of QoE for IPTV Services through Knowledge Plane Development

Plane Wave Propagation: Why Study It?

The Scaling of Nucleation Rates

The Inclined Plane

Justifies of the plane

Justifies of the plane

The production history of the plane

Learning and Inference in the Knowledge Plane

THE COORDINATE PLANE

Why is Knowledge of Composition Important?

Fighting the Plane

The Coordinate Plane

The Fundamental Plane of Quasars

knowledge plane

The Coordinate Plane

The Coordinate Plane

The Coordinate Plane

Free General Knowledge App

Why is Knowledge of Composition Important?