200 likes | 358 Views
DYSWIS. KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY. Do You See What I See?. Do you see what I see?. End user. Internet. End user. End user. Outline. Overview Fault Detection Peer Selection Probing Problem Implementation Demo.
E N D
DYSWIS KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY
Do You See What I See? Do you see what I see? End user Internet End user End user
Outline • Overview • Fault Detection • Peer Selection • Probing • Problem • Implementation • Demo
Overview • Overview • DYSWIS – Do you see what I see • Distributed network fault detection and analysis system • Motivation • Different causes for a particular network fault • Need different ‘view’ from other sources for the fault • End-to-end diagnosis • Need user-friendly interface • Current Problem • Centralized management schemes • Complexity in the user network and devices • Failed to solve the service quality problem • Approach • Collaborate with other end users • P2P based • Remote probing
For Quick Understanding DHT for looking for remote node XMLRPC For Remote Function call Detect Detect Detect Detect Detect Detect Detect Detect Detect Detect Detect Detect Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Internet
Fault Detection • Automatic fault detection • Network raw packet capturing • Analyze network packet and protocol • Raw packet capturing • Check error response • Check timeout • Check TCP congestion • Monitoring TCP sequence numbers • Define fault cases • Automatic vs. Manual • FSM approach • pre-define • learning
FSM - Approach * Automatic Protocol Failure Detection Using Finite State Machines Zhifeng Wang , Kai X. Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa Kim, Vishal Kumar Singh
FSM - Approach * Automatic Protocol Failure Detection Using Finite State Machines Zhifeng Wang , Kai X. Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa Kim, Vishal Kumar Singh
Peer Selection • Peer Selection • DHT or Database • Register myself to DHT network • AS number, subnet, first hop, AP. • Search probing nodes • Inner nodes and outer nodes You can contact to B. His IP address is 218.59.21.16 and port number is 9090 I need some nodes who can help me. Who is in same subnet with me? A B DHT
Peer Selection - DHT (key, value) <key> <type>node</type> <asn>14<asn> <subnet>128.59.0.0/16</subnet> </key> <value> <type>node</type> <ip>128.59.21.15</ip> <port>9090</port> <protocol>udp</protocol> </value> I need some nodes who can help me. Who is in same subnet with me? <key> <type>node</type> <asn>9880<asn> <subnet>45.45.45.0/24</subnet> <firewall>no</firewall> <nat>no</nat> </key> <value> <type>node</type> <ip>128.59.21.15</ip> <hostname>kkh.cs.columbia.edu</hostname> <port>9090</port> <protocol>tcp</protocol> </value> A B DHT
Remote Probing • Distributing modules • Detecting and probing modules should be added and updated • Dynamic class loading • Dynamic module distributing • Modules can be created and updated separately. • XMLRPC
Probing Scenarios • HTTP • Causes: Dead web-server , page moved, low bandwidth … • Check DNS query • TCP connection • Ask other node to try same query • Check TCP congestion • … • DNS • Causes : Dead DNS server , resolution failed, udp is not working , … • Check other DNS server • Ask other node to try to connect my DNS server • Ask other node to query same host to another DNS server • SIP/RTP • Causes: NAT, DNS, proxy server, authentication • Proxy connectivity test • Ask other node to try same action. • …
Probing Scenarios • Connection problem • Causes : Dead server, firewall, wrong port number … • Traceroute – Check routers • Ask other node to try to connect the server • Ask other node to check my port • … • TCP Congestion • Causes : Queuing delay, dead routers • Traceroute , ping • Try to find bottleneck • …
Data Gathering • Problem • We have resources: Other machines • But how do we use them efficiently? • We need real data • Approach • Collecting data • Collecting Scenarios • Implementing prototype
Implementation • Architecture http://wiki.cs.columbia.edu/display/res/DYSWIS
For the detail, visit : http://wiki.cs.columbia.edu/display/res/DYSWIS
Demo • Demo
Future work • Implementation • http://www.cs.columbia.edu/~khkim/project/dyswis • Coming soon : Mac & Linux • Testbed - PlanetLab • Mature research for analysis • Support real time protocols • How to find solutions for end users
backup • Check local network. • Select two nodes, one from same subnet, another one from outer subnet. • Let the nodes try to connect the server. • If both nodes failed to connect the server, log this fault as ‘server failure’. • If only internal node failed, execute traceroute to check where the packet is blocked. • If internal node succeeded, it is possible that this problem is caused by local firewall or something else. • Check incoming/outgoing port; Let other nodes open same port, and try to connect there. Check the remote node received packet or not. Check the ACK from remote node came back.