1 / 13

GT: Picking up the Truth from the Ground for Internet Traffic

GT: Picking up the Truth from the Ground for Internet Traffic. Francesco Gringoli, Luca Salgarelli, Niccolo' Cascarano, Fulvio Risso, and Kimberly C. Claffy SIGCOMM Comput. Commun. Rev. 2009 Networking Journal Club 28th May 2010. Outline. Introduction and Related Work The GT Architecture

Download Presentation

GT: Picking up the Truth from the Ground for Internet Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GT: Picking up the Truth from the Ground for Internet Traffic Francesco Gringoli, Luca Salgarelli, Niccolo' Cascarano, Fulvio Risso, and Kimberly C. Claffy SIGCOMM Comput. Commun. Rev. 2009 Networking Journal Club 28th May 2010

  2. Outline • Introduction and Related Work • The GT Architecture • Testbed Setup and Experimental Analysis • Design choices • Conclusions

  3. Introduction and Related Work • Motivation: • Traffic modeling, intrusion detection… need traces where application (and protocol) is associated with each packet or flow. • Current Approaches: • Manual generation • Problems: bias (lack of human behavior), background applications • DPI • Problems: encrypted traffic, ambiguity (different protocols, similar signatures), port-based is obsolete

  4. Introduction and Related Work • Current Approaches: • Application stamping • Problems: real time, packet size close to the MTU • BLINC • Problems: accuracy 30% • Proposed solution: • GT: • By monitoring host’s kernel • Associates each packet (flow) with the name of its controlling application

  5. GT Architecture • Four parts: • Client daemon • Packet capture engine • Database server • IPClass

  6. GT Architecture: client daemon • Functionality: • To track changes in active network sockets, and collect and transmit to the database server relevant information about the application that own the sockets. • In user-space (mirrors active socket list handled by the kernel), a thread loop periodically synchronizes with kernel tables (configurable frequency) • Currently, compiles and runs on many platforms (Windows Vista/XP/2003, Linux 2.4 and 2.6, Mac OS X 10.4 and 10.5, Free BSD 5 and 6)

  7. GT Architecture: packet capture engine and database server • Packet capture engine: tcpdump • Database server: MySQL • Each entry: • 5-tuple • Log time • Name of the application • Type of log event (create, destroy,…)

  8. GT Architecture: IPClass tool • IPClass reconciles information contained in the data base with the captured traffic traces. • For each packet of a flow (with timestamp t_0), IPClass looks in the database for a flow with log time close to t_0. • If found, the entry will unequivocally indentify the application that generated the flow • Associating protocols to flows: • Inspecting each application and compiling a list of protocols used by the application itself (l7filter)

  9. Testbed setup • Campus network environments (upstream): • UNIBS: 6 days, 18 GB, tcpdump 2.4GHz QuadCore (<100 Mbps) • POLITO: 3 days, 200 GB, Endace card, (>100 Mbps) • NTP to maintain synchronization

  10. Experimental Analysis • Two metrics: completeness and accuracy • Completeness: • Relevant parameter: polling time • Too short -> unnecessary overhead • Too long -> missing flows • Polling time vs. % CPU: • 4 s -> CPU<5% • 1 s -> CPU~5% • 125 ms -> CPU 20-50% • (depending on Operating System)

  11. Experimental Analysis: Completeness • Tagged flows/bytes: • 99% TCP bytes • 60-80% TCP sessions (90% for Mac OS) • >87% UDP sessions (~100% for Linux) • Flows not tagged -> flows very short -> looks for other (unique) tagged flow with same 5-tuple • -> 99% flows 

  12. Experimental Analysis: Accuracy • GT refers only application names, but, normally, exists a relation between applications and protocols. • GT can improve accuracy of traffic analyzer (e.g. DPI): • Only use signatures “relevant” to each flow • GT+DPI improves upto 41% (of bytes) and 21% (of flows).

  13. Design Choices and Conclusions • Design Choices: • Centralized vs. Distributed • User-space vs. Kernel-space • Conclusions: • Implementation of an open source toolset: • GT assigns application labels to traffic flows, allowing storage of ground truth with the trace itself • 99% of bytes and 95% of flows without affecting CPU load • Improve the accuracy of DPI up to 85% (UDP Skype) and upt to 91% (P2P applications).

More Related