1 / 62

Felix Project Inferential Topology Discovery: From Delay Data to Network Graph

Felix Project Inferential Topology Discovery: From Delay Data to Network Graph. Mark W. Garrett 14 February 2001 J. Baron, D. Shallcross C. Huitema, J. DesMarais, B. Siegell, P. Seymour, F. Chung. Darpa ITO Intrusion Detection Program. An SAIC Company. The Felix Project Goals.

paul2
Download Presentation

Felix Project Inferential Topology Discovery: From Delay Data to Network Graph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Felix ProjectInferential Topology Discovery:From Delay Data to Network Graph Mark W. Garrett 14 February 2001 J. Baron, D. Shallcross C. Huitema, J. DesMarais, B. Siegell, P. Seymour, F. Chung Darpa ITOIntrusion Detection Program An SAIC Company

  2. The Felix ProjectGoals • Evaluate network status independently fromthe usual network management protocolsand data. • E.g., no use of routing protocols, ping,traceroute, ICMP, SNMP, etc • Measure network by sending sparse probe packets among a set of monitors. Collect delay and loss data. • From these data discover the network topology and evaluate the performance of all links in the network. • Small new field of research developing called “Inferential Topology Discovery” (Kurose, Towsley, Paxson, McCanne, Caceras, Duffield, et al.) • This talk presents a particular method based on modeling correlation across the observations.

  3. AB AC AD AE AF BC BD BE BF CD CE CF DE DF EF AB 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 AC 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 AD 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 AE 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 AF 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 BC 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 BD 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 BE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 BF 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 CD 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CE 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 CF 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 DE 0 0 1 1 0 0 0 1 0 1 1 0 1 1 1 DF 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 EF 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 898896670 145718 F A : Fri Jun 26 17:31:10 1998 Fri Jun 26 17:31:10 1998 1 0 0 0 0 898896693 159087 D E : Fri Jun 26 17:31:33 1998 Fri Jun 26 17:31:33 1998 22 0 0 0 0 898896707 184151 C D : Fri Jun 26 17:31:47 1998 Fri Jun 26 17:31:47 1998 6 0 0 0 0 898896718 173311 B F : Fri Jun 26 17:31:58 1998 Fri Jun 26 17:31:58 1998 6 0 0 0 0 898896762 195353 D E : Fri Jun 26 17:32:42 1998 Fri Jun 26 17:32:42 1998 22 0 0 0 0 898896907 243507 F A : Fri Jun 26 17:35:07 1998 Fri Jun 26 17:35:07 1998 1 0 0 0 0 898896923 252194 A C : Fri Jun 26 17:35:23 1998 Fri Jun 26 17:35:23 1998 8 0 0 0 0 898897096 315751 D C : Fri Jun 26 17:38:16 1998 Fri Jun 26 17:38:16 1998 9 0 0 0 0 898897099 321974 E B : Fri Jun 26 17:38:19 1998 Fri Jun 26 17:38:19 1998 2 0 0 0 0 898897101 326261 F C : Fri Jun 26 17:38:21 1998 Fri Jun 26 17:38:21 1998 3 0 0 0 0 898897265 376966 E F : Fri Jun 26 17:41:05 1998 Fri Jun 26 17:41:05 1998 7 0 0 0 0 898897280 371363 B C : Fri Jun 26 17:41:20 1998 Fri Jun 26 17:41:20 1998 6 0 0 0 0 898897285 371371 B F : Fri Jun 26 17:41:25 1998 Fri Jun 26 17:41:25 1998 6 0 0 0 0 898897333 401269 C E : Fri Jun 26 17:42:13 1998 Fri Jun 26 17:42:13 1998 14 0 0 0 0 898897351 385009 A F : Fri Jun 26 17:42:31 1998 Fri Jun 26 17:42:31 1998 8 0 0 0 0 898897355 389369 D B : Fri Jun 26 17:42:35 1998 Fri Jun 26 17:42:35 1998 5 0 0 0 0 898897458 428081 C B : Fri Jun 26 17:44:18 1998 Fri Jun 26 17:44:18 1998 9 0 0 0 0 898897511 470461 B D : Fri Jun 26 17:45:11 1998 Fri Jun 26 17:45:11 1998 2 0 0 0 0 898897631 472162 E B : Fri Jun 26 17:47:11 1998 Fri Jun 26 17:47:11 1998 0 0 0 0 0 898897782 558276 D F : Fri Jun 26 17:49:42 1998 Fri Jun 26 17:49:42 1998 9 0 0 0 0 898897897 608592 C D : Fri Jun 26 17:51:37 1998 Fri Jun 26 17:51:37 1998 4 0 0 0 0 898897925 605581 A F : Fri Jun 26 17:52:05 1998 Fri Jun 26 17:52:05 1998 8 0 0 0 0 898897926 616708 E F : Fri Jun 26 17:52:06 1998 Fri Jun 26 17:52:06 1998 3 0 0 0 0 898897938 614421 C B : Fri Jun 26 17:52:18 1998 Fri Jun 26 17:52:18 1998 13 0 0 0 0 898898220 693504 C D : Fri Jun 26 17:57:00 1998 Fri Jun 26 17:57:00 1998 5 0 0 0 0 A F Internet TopologyDiscovery B E e1 e2 e3 e4 e5 e6 e7 e8 e9 AB 1 1 1 0 0 0 0 0 0 AC 1 1 0 1 0 0 0 0 0 AD 1 0 0 0 1 0 1 0 1 AE 1 0 0 0 1 0 1 1 0 AF 1 0 0 0 1 1 0 0 0 BC 0 0 1 1 0 0 0 0 0 BD 0 1 1 0 1 0 1 0 1 BE 0 1 1 0 1 0 1 1 0 BF 0 1 1 0 1 1 0 0 0 CD 0 1 0 1 1 0 1 0 1 CE 0 1 0 1 1 0 1 1 0 CF 0 1 0 1 1 1 0 0 0 DE 0 0 0 0 0 0 0 1 1 DF 0 0 0 0 0 1 1 0 1 EF 0 0 0 0 0 1 1 1 0 C D Performance Assessment Simulator • Delay • Loss • Load • Throughput • Pr cong A F e1 D B e6 D A e2 e5 F e3 e9 E B E e3 e1 e6 e8 (NAP) (NAP) C e7 e4 e2 e7 e4 e5 (backbone site) Graph Rendering E e8 F C C e9 A B D Network MonitoringFelix Data Analysis Approach common component matrix measurement system raw data Identify links intermediate results path component matrix Network element and link performance Create graph graph specification(nodes and links) network graph network map Add geographic information

  4. M2 M3 M1 M4 M5 M6 M15 M14 M7 M13 M8 Monitor M9 M12 (Interior) Node Cloud Path M11 M10 Network DiscoveryTerminology for Network Topology and Monitoring • For m monitors, there are np = m(m-1) paths • The number of links is between m (star) and m2 (full mesh) • Links are unidirectional • … So a line in the graph usually represents two links

  5. M2 M3 M1 M4 “Series Equivalent Edges” Links Not Traversed by Monitor Packets M5 Network DiscoveryReduced Graph Concept • Define Reduced Graph as the sub-graph within the network that is discoverable. • Excludes links not traversed by monitor packets • Combines equivalent edges, i.e. edges traversed by exactly the same set of paths. • Non-series equivalent edges can occur when reducing a real graph, but they are very rare.

  6. Network DiscoveryExample of Complete Network and Reduced Graph 3150 nodes WAN-MAN-LAN design 100 monitors 187 nodes 698 (unidirectional) links Reduced graph tends to include more of backbone and less of edges

  7. “Non-Series Equivalent Edges” A B Network DiscoveryReduced Graph – Non-series Equivalent Edges • Here is an (artificially) symmetrical graph with equivalent edges. • We have seen non-series equivalent edges only once in reducing randomly generated graphs (out of 100+ examples)

  8. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… monitors is a successive approximation to the network.

  9. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2, 3… monitors is a successive approximation to the network.

  10. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 4… monitors is a successive approximation to the network.

  11. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 5… monitors is a successive approximation to the network.

  12. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 6… monitors is a successive approximation to the network.

  13. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 7… monitors is a successive approximation to the network.

  14. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 8… monitors is a successive approximation to the network.

  15. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 9… monitors is a successive approximation to the network.

  16. Network DiscoveryReduced Graph Related to Paths • Reduced graph determined by n = 2… 10… monitors is a successive approximation to the network. Etc…

  17. +c -c -c +c A Relationship Between Observable Path Metric, Topology and Link Performance • The delay along a path = sum of delays for each link DP = X  dL • X identifies topology (in terms of links on paths), and is always rank deficient. • To illustrate, consider adding a constant delay to each link into a particular node, and subtracting from outgoing links. • A variation on this general relationship can be formulated with each performance metric: packet loss, link load, throughput, congestion probability.

  18. Felix Data MeasurementsRouting Changes Apparent in Data Data courtesy of Advanced Network Solutions

  19. Felix Data MeasurementsRouting Changes Apparent in Data Data courtesy of Advanced Network Solutions

  20. Felix Data MeasurementsRouting Changes Apparent in Data Data courtesy of Advanced Network Solutions

  21. Felix Topology DiscoveryCorrelation Method: Concept

  22. Group 1 Group 3 Group 4 Group 1 1 Path A 0 Group 5 1 Path B 0 Group 2 1 Path C 0 1 Path D 0 Felix Correlation MethodIdentifying Links By Correlation of Paths

  23. Felix Correlation MethodAbstracting Congestion Event Sequence From Data • Open problem: how exactly to get from a delay measurement on a real network to a series of thresholded congestion “events”. • Several approaches: • Average delay in a fixed-length sliding window • Cross-correlation function (pair-wise between paths, but promising…) • Congestion decision can be complex combination of delay and loss in window – probably most robust method, but needs some empirical experience to create useful methodology. • We assume a solution and solve the next part…

  24. Felix Correlation MethodNetwork Model Assumptions • Node processing delay is negligible, so paths sharing nodes(but not links) do not show correlation. Queueing delay is associated with the link. • Network links congest independently. • Congestion is modeled asfixed-length discrete-time events • Congestion rate is fixed for eachlink, but can vary over a range forthe set of links in the network. • Routes are stable • Monitor packets are exchangedfrequently enough that congestionevents will be recorded consistentlyacross all paths crossing a given link. • Note, this does not require every event to be noticed, and real congestion events do occur over a wide range of time scales.

  25. Felix Correlation MethodObservations and Triggers • An Observation is a measurement of congestion (however defined) on a path between two monitors. • A Trigger is a hypothetical cause of congestion, such as a link, or a group of links, in the network. • Method of solution: Based on joint observations across all paths, define a model that discriminates statistically between the true triggers, that represent links in the network, and the apparent (or false) triggers that are due to combinations of true links congesting simultaneously. Then reduce the triggers down to single links.

  26. M1 M3 M2 M4 M5 Felix Correlation MethodObservations and Triggers Illustration of observations, triggers, paths and links: Definitions and Notation: • An observation event occurs at time t, when a set of paths are congested and not congested as specified. • For example,is the observation that paths a, b, d, k are congested and paths c, g are not congested at time t. Paths not included in the subscript are “don’t care” for this observation variable. Observation “a” = path M1M3, Observation “b” = path M2M4 Trigger a = all links on path a Trigger ab = links in common between paths a and b

  27. Felix Correlation MethodObservations and Triggers • A trigger event occurs at time t, when at least one link congested that is a member (or not a member) of a set of paths as specified. • For example,is the event that some link congests that is shared by paths a, b, d, k, and is not on path c, or path g. • We refer to paths in the specification as “included” or “excluded” • If all paths are included or excluded, the trigger is “fully specified” • Observation and Trigger Probabilities follow these examples:

  28. Felix Correlation MethodRelationship Between Observations and Triggers • Now we can related the observation and trigger probabilities in several interesting ways. E.g., [Ratnasamy & McCanne] • This set says, considering only two paths, if we see congestion on both paths, then it is caused either by a link the two paths share in common, or one link on each of the paths (not in common) are congesting together. • Similarly, if we see congestion on only one path, it must be due to a link that is on that path, and not on the other. • Note, this forces us to explicitly write the combinations of triggers that can cause an observation (not very scaleable).

  29. Felix Correlation MethodRelationship Between Observations and Triggers • Another interesting and useful relationship is this: • This one says that we observe no congestion on a set of paths only when none of the triggers that are on those paths are active. • We say a path (in the trigger specification) contradicts the observation when a path turned off in the observation is included in the trigger. (It is easy to write down these combinations.) • Inclusion of observations with multiple paths makes this model more powerful than an earlier method (DP = X  dL) that relied on a rank-deficient matrix.

  30. Felix Correlation MethodOrganization of Triggers • Tree contains all potential triggers, i.e., all possible combinations of paths that can specify a link or group of links. • Triggers on a level partition the set of (potential) links in the graph • The tree grows exponentially as we add paths, but the number of true triggers is bounded by the number of links in the network.

  31. Felix Correlation MethodSome More Useful Stuff From the Model… • Observation of congestion on a path means some link on that path is congesting (single-path observation and trigger). • Something must be happening, so the sum over all possible observations with n paths specified equals unity. • Child triggers are related to their parent. • No congestion observed anywhere means all triggers are quiet. (The product of all inverse triggers on any level is constant.)

  32. Felix Correlation MethodSolving for Trigger Probabilities – 3 Path Example • Observation of no congestion on 3,2,1 paths implies no activity on any trigger that includes one of the named paths • Triangular form: each equation produces one Pvt

  33. class 1 triggers 0 ≤ j < k class 3 triggersk < j ≤ n-1 class 2 triggers j = k j = 0 j = 1 j = n-1 n paths in trig k paths in obs Master equation hask = n Felix Correlation MethodGeneralization of Solution to Any Number of Paths • Count various things: • n = number of paths in the triggers = level in tree diagram • k = number of paths in the observation (varying from n down to 1) • j = number of paths excluded in the triggers (varying from 0 to n-1) • Divide “Master” equation by each “Specific” equation to find one trigger probability

  34. Felix Correlation MethodGeneralization of Solution to Any Number of Paths • For n paths there are 2n-1 equations and 2n-1 triggers. • The “Master” equation has all possible triggers, i.e., any active trigger contradicts the observation of no congestion anywhere. • For class 1 triggers (0 ≤ j < k): • The j paths excluded in the trigger cannot cover all k paths in the observation, so at least one path is included in the trigger that contradicts the observation. • All triggers then occur in both the master and specific equations, and cancel out in the division. • For class 2 triggers (j = k): • The j paths excluded in the trigger can cover the k paths in the observation, but there is only one combination. Call this the target trigger. All other triggers contradict the observation and cancel out. • There is one equation in which each such target trigger survives the division.

  35. Felix Correlation MethodGeneralization of Solution to Any Number of Paths • For class 3 triggers (k < j ≤ n-1): • There are such triggers. • No class 3 triggers exist in the first two stages(k = n, and k = n–1) • All class 3 triggers are computed at previous stages, when they appear as class 2 triggers. • For example, consider the case k = 8 < j = 9. In the previous stage when we had k = 9, the class 2 triggers with j = 9 were solved. • Each “Quotient” equation is left with one unknown trigger

  36. Felix Correlation MethodGeneralization of Solution to Any Number of Paths • General form of solution, for trigger probabilities with paths excluded (first case), and with no paths excluded (second case): Where: • E is the set of excluded paths in the trigger • I is the set of included paths in the trigger • N is the set of all paths • w is the set of class-3 trigger probabilities in the master equation, but not in the specific equation • u is the set of all trigger probabilities with at least one path excluded.

  37. Felix Correlation MethodPruning Tree Reduces Computational Complexity • Returning to the tree of trigger probabilities… • For triggers that specify actual links in the network, the trigger probability is the (aggregate) congestion rate on that set of links. • False triggers (for which no link exists) are approximately zero • (True) triggers on the last level identify single links and their associated paths (reduced graph). • Therefore, a trigger prob. of zero can be pruned out along with all of its descendents. • Number of triggers to compute is bounded by (paths • links). Let’s see some results…

  38. Felix Correlation MethodResults 18 monitors 23 nodes 95 (unidirectional) links

  39. Felix Correlation MethodResults 19 monitors 27 nodes 114 (unidirectional) links

  40. Felix Correlation MethodResults 20 monitors 29 nodes 121 (unidirectional) links

  41. Felix Correlation MethodResults 50 monitors 61 nodes 269 (unidirectional) links • Run with link congestion rate of 1% (best efficiency) • Approx 12 hours to compute

  42. Felix Correlation MethodAlgorithm Complexity • Complexity of correlation algorithm is more than (paths • links) because the computation of triggers increases with number of paths… • …but it is polynomial: O(LPN + L2P) for L links, P paths, N simulated time intervals. • However, the overall run-time is apparently exponential, because it takes more data to discriminate the true and false triggers as the network gets larger.

  43. Felix Correlation MethodAlgorithm Complexity • Running time of simulation and correlation code as function of network size (number of links) • Exponential increase if quality of result held constant. • Link Congestion Rate = 10% (constant).

  44. Felix Correlation MethodResults With Variable Link Congestion • Constant link congestion rate is artificial constraint • Algorithm works well with links congesting in a range,e.g., tried 1% – 5%, 1% – 10%, 1% – 15%, etc. • Effect is to spread the distribution of true trigger probabilities • Longer convergence time • Probably all of the simplifying assumptions in the model can be relaxed at the cost of increased convergence time. • Correlation algorithm ran fastest with 1% link congestion • Probably an artifact of implementation…

  45. μ2  μ1 ησ false triggers false triggers true triggers true triggers 0 0 Felix Correlation MethodStatistical Discrimination Problem • Nice scaling property of the algorithm depends on being able to discriminate true from false triggers. • False triggers are approximately zero, but at edge of solvable parameter space, both populations are more noisy • Too little data (from simulation or measurement) • Too much variability in link loss rates • Too much dependence between link congestions, etc, etc • Need to set threshold, group triggers and evaluate “goodness” of resulting topology.

  46. Felix ProjectGeneral Discussion • We can make use of multicast idea (MINC project) to reduce load on network: each source multicasts packets to all receivers. • This will improve coincidence of measurements in time across all paths.

  47. Felix Topology / Performance InferenceApplicability • Does not replace “traditional” autodiscovery methods (SNMP) • May augment autodiscovery in difficult environment: • Military network under physical attack • Military or commercial network under cyber-attack • Network with buggy software (e.g. routing implementation) • Multiple protocol layers, not all included in autodiscovery • Protocols too old or new for the autodiscovery technology • Good for observing networks not under your control • Commercial context: ISP tries to locate fault between networks • Military context: Map out foreign network • Future networks will probably be more chaotic • Track changing topology & performance with minimal extra load

  48. Felix ProjectFurther Work • Augment algorithms to work in more fully realistic environment: • Non-discrete time: congestion events with “ragged edges” • Less stable routing (this is hard) • Dependence in link congestion – cross traffic routed through net • More volatile delay and loss patterns (most significant issue) • Wider range of congestion rates; more erratic time dependence • Variation with delay metric (instead of probability of congestion) is possible. • Result would be bounds on mean, variance, (higher moments) of delay distribution on each link. • Procedure is analogous (but not identical) to present algorithm. • Progressive version of algorithm to update existing topology estimate based on continuous data. • More experience with real data

  49. Event Abstraction Algorithm Packet Delay & Loss Data Event Correlation Algorithm Path-Link Matrix Graph Construction Algorithm (“Matroid” Alg) Event Time Series Network Graph Felix Correlation MethodSummary: Three Stages in Topology Discovery • Reduced graph concept: limitation of observability • Decomposition of topology/performance inference into separable problems • Allows optimization and variation of algorithms at each stage • Correlation Method: • Uses entire time series of data for each path. • Takes advantage of joint statistics across all paths Future Work

  50. Felix Project Extra Slides

More Related