10 likes | 167 Views
cid2tuple. packets. Filter raw data. connections. TCP rate limit analyser. Xplot plotter. Off-line analysis. tid cnxid reverse srcIP srcPort dstIP dstPort. tid cnxid reverse timestamp flags startSeq endSeq nbBytes ack win urgent options. tid cnxid reverse started
E N D
cid2tuple packets Filter raw data connections TCP rate limit analyser Xplot plotter Off-line analysis tid cnxid reverse srcIP srcPort dstIP dstPort tid cnxid reverse timestamp flags startSeq endSeq nbBytes ack win urgent options tid cnxid reverse started duration throughput bytes packets dataPkts acks pureAcks pushes syns fins resets urgents sacks minRwnd maxRwnd avgRwnd tools Query (filter & combine) data Interpret results Store intermediate results Process raw data Load raw data into DB Descriptions Sub Sub Sub Sub Sub Sub Process filtered data Queries Sub Sub Results Sub Sub Sub Sub Base data Combine w/ previous results Flight identifier Connection identifier Connection summarizer elementary functions maps summarizes traces retransmissions Store into files tid description location date trafficType connectionsTable packetTable describes tid cnxid reverse uniqueBytes retransmittedBytes Tcpdump packet traces Web100 timeseries Application logs base data Interpret results Data Warehouse Application logs Application DBMS Preprocess Web100 Raw base data files TCP tcpdump IP Network link InTraBase: Integrated Traffic Analysis Based on a DBMS M. Siekkinen1, E.W. Biersack1, V. Goebel2, T. Plagemann2, G. Urvoy-Keller11Institut Eurecom, France 2University of Oslo, Norway Overview Internet traffic analysis as a research area has experienced rapid growth over the last decade. The traffic data collected for analysis are usually stored in plain files and the analysis tools consist of customized scripts each tailored for a specific task. As data are often collected over a longer period of time or from different vantage points, it is important to keep metadata that describe the data collected. The use of separate files to store the data, the metadata, and the analysis scripts provides an abstraction that is much too primitive: The information that “glues” these different files together is not made explicit but is solely in the heads of the people involved in the activity. As a consequence, manipulating the data is very cumbersome, does not scale, and severely limits the way these data can be analyzed. This poster describes our approach, that we call InTraBase, to use a database management system (DBMS) that provides the infrastructure for the analysis and management of data from measurements, related metadata, and obtained results. A first prototype of InTraBase is also introduced. • Introduction • Problems in analyzing Internet traffic: • Management • Data, metadata, and tools • Getting lost with files containing data and scripts • Analysis cycle • Data loses semantics • Scalability • Cannot analyze large enough data sets • Existing approaches do not fit our needs. • InTraBase Approach • Goal: to devise an open platform for traffic analysis that would facilitate researchers’ work and that: • (i) conserves the semantics of data during the analysis process; • (ii) enables the user to easily manage his own set of analysis tools and methods and; • (iii) share them with colleagues; • (iv) allows the user to quickly retrieve small pieces of information from analysis data and simultaneously develop tools for heavier and more advanced processing. The database consists of reusable components Performing new analysis is less laborious and error prone Data conserves semantics • Store reusable intermediate results • It is possible to easily combine different data sources • E.g. application level events explain some of the phenomena in the traffic at TCP layer Data becomes structured • Processing and updating data is easier • Searching is more efficient (indexes) • First Prototype • The prototype allows analysis of tcpdump packet traces with PostgreSQL. Five steps to process a tcpdump packet trace: • Copy packets from a file into the packets • Build an index for the packets based on cnxid • Create connection level statistics into connections • Insert unique 4-tuple to cnxid mapping data into cid2tuple • Count retransmitted data per connection into retransmissions SQL queriesAdvanced queries Develop new PL functions w/ PL functions A set of PL functions provided • pl/pgSQL • Produce timeseries (packet inter-arrival times, throughput…) • Plot time-sequence diagram in xplot format • Perform analysis on a whole packet trace using timeseries • pl/R • Produce graphs • Statistical calculations Tested on Linux 2.6.3. , 2x Intel Xeon 2.2GHz, SCSI RAID, 6GB RAM • Our previous experience of Internet traffic analysis with a plain file-based approach has lead us to the conclusion that for improved manageability and scalability, new approaches are needed. • We advocated a DBMS-based approach and the initial experiences with the first prototype are clearly encouraging. • Much work is still left to do, such as, develop more complex tools, build support for variety of different base data (also other types of traffic analysis data, e.g. BGP data), and apply to much larger data sets in the order of Terabytes. DBS filter program - enforce structure - add cnxid & reverse tcpdump file tcpdump psql copy • Upload base data into the db and process it within the db • Issue SQL queries • Use extensibility of the DBMS to create functions for advanced processing Perform traffic measurements and store results in base data files Conclusions