420 likes | 430 Views
Routing and Network Design: Algorithmic Issues. Kamesh Munagala Duke University. Graph Model for the Links. Model sensor nodes as vertices in a graph. Gateway. d(7,8) = “Length” of link. “Length” of link models communication cost per bit
E N D
Routing and Network Design:Algorithmic Issues Kamesh Munagala Duke University
Graph Model for the Links Model sensor nodes as vertices in a graph Gateway d(7,8) = “Length” of link “Length” of link models communication cost per bit “Length” should be a function of #bits being sent (Why?)
Specialized Nature • “Geometric Random” graph • Nodes on a 2D plane • Each node has a fixed communication radius • Correlation Structures: • Spatial Gaussian models • Simple AR(1) temporal models • Assumptions do not always hold!
Unique Features • Distributed algorithms: • Reconfigure routes around failures • Learning network topology • Learning correlation structures • Query processing • Light-weight implementations: • Low compute power and memory • Limited communication and battery life • Noisy sensing and transmission
Goals in this Lecture • General algorithmic ideas capturing: • Simplicity and efficiency • Some performance guarantees • Distributed implementations • Low reliance on specific assumptions • Caveats: • Ideas need to be tailored to context • Specialized algorithms might work better
Topics • What constitutes good routing? • Measures of quality • Algorithm design framework • Basic problem statements • Spanning, shortest path, and Steiner trees • Aggregation networks • Location and subset selection problems • Solution techniques • Types of guarantees on solution quality • Models of information in a sensor network • Tailoring generic algorithms to specific models
Routing Tree • Problem Statement: • Route information from nodes to gateway • Choose subset of edges to route data • Edges “connect” all nodes to gateway • Tree Property • Minimize: • Long-term average “cost” of routing • Answer will depend on: • What constitutes “cost” • Correlations in data being collected
Toy Example Gateway 6 “Star” network 6 6 6 1 1 1 Each node has 100 bits of information to send to gateway Value on link (edge) is the cost of transmitting one bit How should we route the bits?
Depends on Correlations Gateway 6 Cost = 100 * (6+1+1+1) = 900 units 6 6 6 Ignore cost of compression 1 1 1 Suppose information is perfectly correlated Information from all sources together is also 100 bits! Spanning tree is optimal
Other Extreme: No Correlation Gateway 6 Cost = 100 * (6+6+6+6) = 2400 units 6 6 6 1 1 1 Suppose information is not correlated at all Information from all sources together is now 400 bits Shortest path tree is optimal
Had we used a Spanning Tree Gateway 6 Cost = 100 * (6+7+8+9) = 3000 units > 2400 units! 6 6 6 1 1 1 Suppose information is not correlated at all Information from all sources together is now 400 bits Shortest path tree is optimal
In summary… • Moral of the story: • Choosing good routes is important • Choice depends on correlation structure • Issues to address: • How do we specify correlations • Simple yet faithful specifications desirable • Algorithms for finding (near-)optimal routes • Efficient and simple to implement • Reliability and “backup” routes
Minimum Spanning Tree There could be nn-2 many spanning trees in general Exhaustive enumeration is out of question 10 10 20 12 5 20 12 1 5 1 7 15 7 15 Cost of MST = 23
Spanning Tree Algorithm “Greedy” schemes add edges one at a time in clever fashion No backtracking Kruskal's algorithm: Consider edges in ascending order of cost. Insert an edge unless doing so would create a cycle. Prim's algorithm: Start with gateway and greedily grow a tree from the gateway outward. At each step, add the cheapest edge that has exactly one endpoint in current tree.
Prim’s Algorithm: Execution 10 10 20 12 5 20 12 1 5 1 7 15 7 15 10 10 20 12 20 5 12 5 1 1 7 15 7 15
“Distributed” Algorithm? Nodes connect in arbitrary order Each node simply connects to “closest” existing neighbor 10 10 Cost = 25 20 12 5 20 12 1 5 1 7 15 7 15 10 10 20 12 20 5 12 5 1 1 7 15 7 15
Guarantee on “Online” Scheme • n nodes in graph • Cost of “online” tree is within log n factor of cost of MST • Irrespective of order in which nodes join the system! • Intuition: In “star” network, “online” scheme produces MST! • Natural implementation: Greedy starting from gateway • Such a guarantee is called an “approximation guarantee”
Shortest Paths: OSPF • Key algorithmic idea: Greedy local updates • Each node v maintains “tentative” distance d(v) to gateway • Initially, all these distances are infinity • Each node v does a greedy check: • If for some neighbor u, d(v) > d(u) + Length(u,v), then: • Route v through u • Set d(v) = d(u) + Length(u,v) • Run this till it stabilizes
OSPF Execution 0 ∞ 10 0 10 10 20 2 5 20 2 10 5 10 ∞ 7 2 1 20 ∞ 7 ∞ 1 10 10 7 10 7 20 20 2 2 5 5 10 10 2 2 10 17 7 7 1 1 3 3
Rate of Convergence n nodes in graph The protocol converges to the shortest path tree The number of rounds till convergence is roughly n
Intermediate Correlations One tree for all correlation values? Both spanning and shortest path trees at once? Do-able if we settle for “nearly” optimal trees In other words, there exists a tree with: Cost at most thrice cost of MST Distances to gateway at most twice S.P. distances
Example: MST Gateway n n Cost of MST = n2+n n n Path length = n2+n 1 1 1 1 1 1 n2 nodes
Example: Shortest Path Tree Gateway n n Cost = n3 Path Length = n2 n n 1 1 1 1 1 1 n2 nodes
Example: Balanced Tree Gateway n n Cost = 2n2 Path Length = 2n n n 1 1 1 1 1 1 n nodes
Walk on a Tree Gateway
Balancing Algorithm • Walk along Spanning Tree • Add shortcuts to gateway • At node v: • Suppose previous shortcut at u • If SP(u) + Walk(u,v) > 2 SP(v) • Add “shortcut” from v Gateway Shortcut Walk too long!
Example Revisited Gateway n n n n 1 1 1 1 1 1 n nodes Walk length = 2n
Proof Idea • Final Path Lengths < 2 S.P. Lengths • Follows from description • Final Cost < 3 MST Cost • Final Cost = MST + Shortest Paths Added • Suppose paths are added at …,u,v… on walk • SP(u) + Walk(u,v) > 2 SP(v) • Add these up: • Total Walk Length > Total Length of added Paths • But, Total Walk Length = 2 MST Cost
“Most Informative” Placement Close by locations are not very “informative”
Abstraction • Parameters: • Each node v has communication cost to gateway = cv • Depends on location • Subset S of nodes has “information” f(S) • Information is a property of a set of nodes • Depends on whether “close by” nodes also in set • Problem Statement: • Choose set S so that: • Sum of costs of nodes in S is at most C • Maximize Information = f(S)
Algorithmic Issue • Number of subsets of n locations = 2n • Inefficient to enumerate over them • Given subset S, how do we compute f(S) • Needs a correlation model among locations • Communication costs are not additive • Also depend on location of nodes!
Information Functions f(S) = Entropy of S Correlations are multidimensional Gaussian: = Covariance matrix between locations Entropy log det() Covariance(j,k) exp(-dist(j,k)2 / h2)
Properties of f(S) Property 1: f(A+v) ≥ f(A) A B v Location v is more informative w.r.t A than w.r.t B Property 2: f(A+v) - f(A) ≥ f(B+v) - f(B)
Greedy Algorithm • Start with S = • Repeat till cost of S exceeds C: • Choose v such that: • ( f(S+v) - f(S) ) / cv is maximized • “Information gain per unit cost” • Add v to S
Analysis • Suppose: • All costs cv = 1 • O = Best Information set of size at most C • At any stage, if adding v is best greedy decision • Adding entire O cannot give more information per unit cost! • f(S + v) - f(S) ≥ ( f(S + O) - f(S) )/C ≥ ( f(O) - f(S) )/C • Let d(S) = f(O) - f(S) = Deficit w.r.t. Optimal Solution • Implies: d(S) - d(S+v) ≥ d(S) / C
Analysis • d(S+v) ≤ d(S) (1 - 1/C) • d(Initial) = f(O) • d(Final) = f(O) - f(Final) • f(O) - f(Final) = d(Final) ≤ d(Initial) ( 1 - 1/C )C ≤ f(O) / 2 • Implies: f(Final) ≥ f(O) / 2 • Greedy set has information at least 1/2 information in optimal set
Two-Level Routing Aggregation Hub
Clustering Optimal placement of cluster-heads Minimize routing cost
K-Means Algorithm • Start with k arbitrary leaders • Repeat Steps 1 and 2 till convergence: • Step 1: • Assign each node to closest leader • Yields k “clusters” of nodes • Step 2: • For each cluster, choose “best” leader • Minimizes total routing cost within cluster
Analysis • Convergence is guaranteed: • Each step reduces total distance • Step 1: Each node travels smaller distance • Step 2: Each cluster’s routing cost reduces • Rate of convergence: • Fast in practice • Quality of solution: • “Local” optimum depending on initial k nodes • Need not be best possible solution • Works very well in practice