1 / 20

Supporting real-time & offline network traffic analysis

Supporting real-time & offline network traffic analysis. Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research Morristown, NJ, USA. Outline. OSS Requirements Work Proposal Stream Data Management Issues Traffic Warehouse

hoai
Download Presentation

Supporting real-time & offline network traffic analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting real-time & offline network traffic analysis Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research Morristown, NJ, USA

  2. Outline • OSS Requirements • Work Proposal • Stream Data Management Issues • Traffic Warehouse • Tribeca: a stream database manager

  3. OSS Requirements OSS Data time frame/ resp. time • Traffic control & seconds – minutes monitoring • Service level 15 min. – hours agreement • Capacity planning weeks - months

  4. Work proposal (system overview) LAN R LAN WAN R SNMP agent EMS SNMP agent BPF tcpdump adaptor Stream Engine DBMS Live SQL Live Monitor Warehouse Live Monitor Live Monitor client

  5. Real-time traffic analysis: state-of-industry • Ad hoc or canned programs/scripts • Slow deployment • No data sharing • Hard to maintain and little reuse • Traditional DBMS: • Can beat high line speed (e.g., OC48)? • Cumbersome in programming (write into DB then query) • Semantic mismatch between “stream” and “relation”

  6. Stream Data Management • “stream” as a first class object (like “relation”) • Stream: • a continuous, unbounded sequence of records with a total ordering • Issues • Stream algebra • Data types • Query language • Implementation

  7. Stream Algebra • Operators: • Selection: relatively easy • Join: can be defined nicely (assuming unbounded buffer) • Demultiplex/multiplex: the result could be multiple streams • Operands: • Stream + stream • Stream + relation

  8. Data Types • BLOB: • leave the burden to the application developers • Conventional relational data types: • Need “adaptors” to convert from raw types to relational types • Native support for structured binary object (SBO) • Separate fields at “bit” level • Most flexible & efficient, but require re-implementation of the database type system

  9. Stream Query Language • How to handle multi-stream output, e.g. group-by? select avg(ip_stream.packet_size) from ip_stream group by ip_stream.source_ip_addr • How to handle indefinitely waiting in join? select * from s1, s2 where s1.packet_id = s2.packet_id • Time window clause, temporal attributes/operators, …

  10. Implementation Issues • Bounded buffer management • Time-constrained query processing: must beat the buffer refresh rate • Storage & I/O bandwidth requirement (OC48 or higher?) • Migration of data & processing to disk • Data loss & incomplete query

  11. Traffic Warehouse • Repository of traffic data for off-line analysis • Efficient navigation across protocol stack & other business table dimensions • Storage (cluster, parallelism) • Distributed warehouse approach • Chen et al. [SIGMOD2000]: • HTTP, FTP, TCP . IP • tcpdump, HTTP server logs • Caceres et al. [IEEE Comm. 2000]: AT&T WorldNet data warehouse

  12. Tribeca*[VLDB96,USENIX98] • Singe stream input (no “join”) • Supported operators: • Selection • Projection • Aggregates • Mux/demux multi-stream output • Time window • User-defined data type and extraction functions (in C) • Tested on ATM cell traces • Achieved 5-7MB/s (30-40k rec/s ) processing rate on a Sun Sparc10 *former contributors: M. Sullivan, Y. Saraiya, A. Heybey

  13. VCs (virtual circuits) ATM Tribeca: example query • Q1: Count the accumulated number of large IP packets ( > 250 bytes) transmitted over the link. • Q2: Find the number & avg length of TCP/IP packets for every successive 5 second time window. Save to a file.

  14. Tribeca: example query demux on VCI s1 source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 atm cells

  15. Tribeca: example query P2 IP packets demux mux s1 assemble extract source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 p3 atm cells assemble_ip is a user-defined function

  16. Tribeca: example query IP packets demux mux s1 assemble extract source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 stream_qual {p3.length.geq 250} p4 stream_agg {p4.count} atm cells length > 250 p4 count display

  17. Tribeca: example query IP packets demux mux s1 assemble extract source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 stream_qual {p3.length.geq 250} p4 stream_agg {p4.count} stream_qual {{p3.type.eq TCP}} p5 stream_agg {p5.count, p5.length.avg} on fixed window {5 sec} r1 atm cells length > 250 count display p5 type = TCP fixed 5 sec window count, avg (length) r1 (save to file)

  18. Tribeca • data type inheritance (IP - TCP, UDP) • window: fixed vs. moving; user-defined delimiter • record: fixed length, variable length, framing • implementation optimization • dual buffers • minimize data copying: passing pointers instead

  19. Related Activities • CAIDA • SLAC • NLANR • XIWT • AT&T,HP,Sun,Telcordia, … • passive Internet traffic collection at major Internet backbone routers

  20. Related Work • Tangram [Parker90,92] • a model captures streams, sets and parallelism • more a state machine than a query language • SEQ [Seshadri95,96] • static sequences • Datacycle [Bowen92] • information filtering on broadcast data

More Related