DYSWIS

DYSWIS KYUNG-HWA KIM HENNING SCHULZRINNE 12/09/2008 INTERNET REAL-TIME LAB, COLUMBIA UNIVERSITY

Do You See What I See? Do you see what I see? End user Internet End user End user

Outline • Overview • Fault Detection • Peer Selection • Probing • Problem • Implementation • Demo

Overview • Overview • DYSWIS – Do you see what I see • Distributed network fault detection and analysis system • Motivation • Different causes for a particular network fault • Need different ‘view’ from other sources for the fault • End-to-end diagnosis • Need user-friendly interface • Current Problem • Centralized management schemes • Complexity in the user network and devices • Failed to solve the service quality problem • Approach • Collaborate with other end users • P2P based • Remote probing

For Quick Understanding DHT for looking for remote node XMLRPC For Remote Function call Detect Detect Detect Detect Detect Detect Detect Detect Detect Detect Detect Detect Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Diagnosis Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Probe Internet

Fault Detection • Automatic fault detection • Network raw packet capturing • Analyze network packet and protocol • Raw packet capturing • Check error response • Check timeout • Check TCP congestion • Monitoring TCP sequence numbers • Define fault cases • Automatic vs. Manual • FSM approach • pre-define • learning

FSM - Approach * Automatic Protocol Failure Detection Using Finite State Machines Zhifeng Wang , Kai X. Miao, Tao Zuo, Henning Schulzrinne, Kyung Hwa Kim, Vishal Kumar Singh

Peer Selection • Peer Selection • DHT or Database • Register myself to DHT network • AS number, subnet, first hop, AP. • Search probing nodes • Inner nodes and outer nodes You can contact to B. His IP address is 218.59.21.16 and port number is 9090 I need some nodes who can help me. Who is in same subnet with me? A B DHT

Peer Selection - DHT (key, value) <key> <type>node</type> <asn>14<asn> <subnet>128.59.0.0/16</subnet> </key> <value> <type>node</type> <ip>128.59.21.15</ip> <port>9090</port> <protocol>udp</protocol> </value> I need some nodes who can help me. Who is in same subnet with me? <key> <type>node</type> <asn>9880<asn> <subnet>45.45.45.0/24</subnet> <firewall>no</firewall> <nat>no</nat> </key> <value> <type>node</type> <ip>128.59.21.15</ip> <hostname>kkh.cs.columbia.edu</hostname> <port>9090</port> <protocol>tcp</protocol> </value> A B DHT

Remote Probing • Distributing modules • Detecting and probing modules should be added and updated • Dynamic class loading • Dynamic module distributing • Modules can be created and updated separately. • XMLRPC

Probing Scenarios • HTTP • Causes: Dead web-server , page moved, low bandwidth … • Check DNS query • TCP connection • Ask other node to try same query • Check TCP congestion • … • DNS • Causes : Dead DNS server , resolution failed, udp is not working , … • Check other DNS server • Ask other node to try to connect my DNS server • Ask other node to query same host to another DNS server • SIP/RTP • Causes: NAT, DNS, proxy server, authentication • Proxy connectivity test • Ask other node to try same action. • …

Probing Scenarios • Connection problem • Causes : Dead server, firewall, wrong port number … • Traceroute – Check routers • Ask other node to try to connect the server • Ask other node to check my port • … • TCP Congestion • Causes : Queuing delay, dead routers • Traceroute , ping • Try to find bottleneck • …

Probing Scenarios A B

Data Gathering • Problem • We have resources: Other machines • But how do we use them efficiently? • We need real data • Approach • Collecting data • Collecting Scenarios • Implementing prototype

Implementation • Architecture http://wiki.cs.columbia.edu/display/res/DYSWIS

For the detail, visit : http://wiki.cs.columbia.edu/display/res/DYSWIS

Demo • Demo

Future work • Implementation • http://www.cs.columbia.edu/~khkim/project/dyswis • Coming soon : Mac & Linux • Testbed - PlanetLab • Mature research for analysis • Support real time protocols • How to find solutions for end users

backup • Check local network. • Select two nodes, one from same subnet, another one from outer subnet. • Let the nodes try to connect the server. • If both nodes failed to connect the server, log this fault as ‘server failure’. • If only internal node failed, execute traceroute to check where the packet is blocked. • If internal node succeeded, it is possible that this problem is caused by local firewall or something else. • Check incoming/outgoing port; Let other nodes open same port, and try to connect there. Check the remote node received packet or not. Check the ACK from remote node came back.

DYSWIS

DYSWIS

Presentation Transcript

Do You See What I See (DYSWIS)

DYSWIS (Do You See What I See)

Do You See What I See (DYSWIS)? or Leveraging end systems to improve network reliability