200 likes | 341 Views
Supporting real-time & offline network traffic analysis. Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research Morristown, NJ, USA. Outline. OSS Requirements Work Proposal Stream Data Management Issues Traffic Warehouse
E N D
Supporting real-time & offline network traffic analysis Chung-Min Chen Munir Cochinwala Allen Mcintosh Marc Pucci Telcordia Technologies Applied Research Morristown, NJ, USA
Outline • OSS Requirements • Work Proposal • Stream Data Management Issues • Traffic Warehouse • Tribeca: a stream database manager
OSS Requirements OSS Data time frame/ resp. time • Traffic control & seconds – minutes monitoring • Service level 15 min. – hours agreement • Capacity planning weeks - months
Work proposal (system overview) LAN R LAN WAN R SNMP agent EMS SNMP agent BPF tcpdump adaptor Stream Engine DBMS Live SQL Live Monitor Warehouse Live Monitor Live Monitor client
Real-time traffic analysis: state-of-industry • Ad hoc or canned programs/scripts • Slow deployment • No data sharing • Hard to maintain and little reuse • Traditional DBMS: • Can beat high line speed (e.g., OC48)? • Cumbersome in programming (write into DB then query) • Semantic mismatch between “stream” and “relation”
Stream Data Management • “stream” as a first class object (like “relation”) • Stream: • a continuous, unbounded sequence of records with a total ordering • Issues • Stream algebra • Data types • Query language • Implementation
Stream Algebra • Operators: • Selection: relatively easy • Join: can be defined nicely (assuming unbounded buffer) • Demultiplex/multiplex: the result could be multiple streams • Operands: • Stream + stream • Stream + relation
Data Types • BLOB: • leave the burden to the application developers • Conventional relational data types: • Need “adaptors” to convert from raw types to relational types • Native support for structured binary object (SBO) • Separate fields at “bit” level • Most flexible & efficient, but require re-implementation of the database type system
Stream Query Language • How to handle multi-stream output, e.g. group-by? select avg(ip_stream.packet_size) from ip_stream group by ip_stream.source_ip_addr • How to handle indefinitely waiting in join? select * from s1, s2 where s1.packet_id = s2.packet_id • Time window clause, temporal attributes/operators, …
Implementation Issues • Bounded buffer management • Time-constrained query processing: must beat the buffer refresh rate • Storage & I/O bandwidth requirement (OC48 or higher?) • Migration of data & processing to disk • Data loss & incomplete query
Traffic Warehouse • Repository of traffic data for off-line analysis • Efficient navigation across protocol stack & other business table dimensions • Storage (cluster, parallelism) • Distributed warehouse approach • Chen et al. [SIGMOD2000]: • HTTP, FTP, TCP . IP • tcpdump, HTTP server logs • Caceres et al. [IEEE Comm. 2000]: AT&T WorldNet data warehouse
Tribeca*[VLDB96,USENIX98] • Singe stream input (no “join”) • Supported operators: • Selection • Projection • Aggregates • Mux/demux multi-stream output • Time window • User-defined data type and extraction functions (in C) • Tested on ATM cell traces • Achieved 5-7MB/s (30-40k rec/s ) processing rate on a Sun Sparc10 *former contributors: M. Sullivan, Y. Saraiya, A. Heybey
VCs (virtual circuits) ATM Tribeca: example query • Q1: Count the accumulated number of large IP packets ( > 250 bytes) transmitted over the link. • Q2: Find the number & avg length of TCP/IP packets for every successive 5 second time window. Save to a file.
Tribeca: example query demux on VCI s1 source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 atm cells
Tribeca: example query P2 IP packets demux mux s1 assemble extract source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 p3 atm cells assemble_ip is a user-defined function
Tribeca: example query IP packets demux mux s1 assemble extract source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 stream_qual {p3.length.geq 250} p4 stream_agg {p4.count} atm cells length > 250 p4 count display
Tribeca: example query IP packets demux mux s1 assemble extract source_stream s1 is {live, “atm_link_1476”, AtmCellTrace} result_stream r1 is {file “res1”} stream_demux {s1.atm.vci} p1 stream_proj {{p1.assemble_ip}} p2 stream_mux p2 p3 stream_qual {p3.length.geq 250} p4 stream_agg {p4.count} stream_qual {{p3.type.eq TCP}} p5 stream_agg {p5.count, p5.length.avg} on fixed window {5 sec} r1 atm cells length > 250 count display p5 type = TCP fixed 5 sec window count, avg (length) r1 (save to file)
Tribeca • data type inheritance (IP - TCP, UDP) • window: fixed vs. moving; user-defined delimiter • record: fixed length, variable length, framing • implementation optimization • dual buffers • minimize data copying: passing pointers instead
Related Activities • CAIDA • SLAC • NLANR • XIWT • AT&T,HP,Sun,Telcordia, … • passive Internet traffic collection at major Internet backbone routers
Related Work • Tangram [Parker90,92] • a model captures streams, sets and parallelism • more a state machine than a query language • SEQ [Seshadri95,96] • static sequences • Datacycle [Bowen92] • information filtering on broadcast data