180 likes | 385 Views
Troubleshooting Mesh Networks. Lili Qiu Joint Work with Victor Bahl, Ananth Rao, Lidong Zhou Microsoft Research. Mesh Networking Summit 2004. Motivation. Why is it so slow? Cordless phone interference? Neighbors drop traffic? MAC misbehavior? Too much user traffic? Routing problems?
E N D
Troubleshooting Mesh Networks Lili Qiu Joint Work withVictor Bahl, Ananth Rao, Lidong Zhou Microsoft Research Mesh Networking Summit 2004
Motivation Why is it so slow? Cordless phone interference? Neighbors drop traffic?MAC misbehavior? Too much user traffic? Routing problems? TCP problems? … Internet
Research Challenges Just knowing link statistics is insufficient Complicated interactions • Between different network elements • Between different network protocols • Between different faults • Signature-based schemes may not capture all the interactions Need to apply to a wide range of networks Multi-hop wireless networks • Unpredictable physical medium and dynamic topology • Limited resources • Scale to hundreds of nodes
Our Approach Framework: online trace-driven simulation • Create a real network inside a simulator • Identify root cause by searching for the faults that reproduce the same faulty symptom Advantages • Applicable to a large class of networks • Capture complicated interactions • Extensible to diagnose new faults • Facilitate what-if analysis
Troubleshooting Framework FaultDiagnosis MeasuredPerformance Root Causes Raw Data SimulatedPerformance CandidateFaults Data Collection DataCleaning Trace-DrivenSimulation Routes Link Loads Root cause analysis module
Common Concerns and Our Approaches for Simulation-Based Diagnosis • Simulation accuracy - Trace-driven simulation - Remove erroneous data from the trace 2. Too expensive to simulate - Advances in network simulator - Focus on long-term faults - Compression, spatial scoping, adaptive monitoring, multicast 3. Too large fault space - Develop an efficient search heuristic
Data Gathering What data to collect? • Network topology • Traffic statistics • Physical medium • Link performance Data sources: SNMP, WRAPI, Packet sniffers, NativeWiFi Dealing with Imperfect Data • Neighbor monitoring • Using history information • Find the smallest number of misbehaving nodes to explain inconsistency in traffic reports
Fault Diagnosis Algorithm Challenge • Large fault space brute-force search is infeasible 1. Initialization: diagnosed fault set F = { } 2. while (diff(MeasuredPerf, SimulatedPerf(F)) > threshold) { Foreach f in F Adjust f’s magnitudes if necessary Delete f is its magnitude is too small Add a new candidate fault if necessary Simulate } 3. Report F
Performance Evaluation Effectiveness of data cleaning • Detect >80% misbehaving nodes with <15% false positive Effectiveness of fault diagnosis Accuracy of detecting combinations of packet dropping, MAC misbehavior, and external noise in 25-node random topology
Performance Evaluation Test-bed • Implemented the technique in a small multi-hop IEEE 802.11a mesh testbed • Detected network congestion and random packet dropping
Conclusion & Future Work Propose online trace-driven simulation • Diagnose faults • Test alternative network configurations • Our evaluation results show it is promising Future work • Validate it in a larger-scale testbed • Extend it to handle mobility • Apply it to handle other types of faults
Related Work Protocols for wireless network management • Ad Hoc Network Management Protocol (ANMP) • Guerrilla Management Architecture • Complementary to our work Fault management for wireless infrastructure networks • AirWave, AirDefense, UniCenter, WNMS, IBM WSA, Wibhu SpectraMon … • Different from multihop wireless networks Detect specific faults in multihop wireless networks • Routing misbehavior • Mac misbehavior, …
Trace-driven Simulation CandidateFaults Fault Injection SimulatedPerformance RoutingUpdates RouteSimulation LinkLoads Traffic Simulation