300 likes | 582 Views
Data Streaming in Computer Networking. Cristian Estan, George Varghese University of California, San Diego. Talk structure. Traditional streaming in networking Rules of the game Iteration paradigm: packet scheduling example New streaming problems Detecting malicious traffic
E N D
Data Streaming in Computer Networking Cristian Estan, George Varghese University of California, San Diego
Talk structure • Traditional streaming in networking • Rules of the game • Iteration paradigm: packet scheduling example • New streaming problems • Detecting malicious traffic • Understanding network workloads Data streaming in computer networking - MPDS 2003
Internet service model Source port Destination port Source IP address Destination IP address Data Header Flow Internet Data streaming in computer networking - MPDS 2003
Traditional router functions IP Lookup ? Incoming 1 Outgoing 1 Incoming 2 Outgoing 2 Incoming 3 Outgoing 3 Data streaming in computer networking - MPDS 2003
Traditional router functions IP Lookup Out2 Incoming 1 Outgoing 1 Incoming 2 Outgoing 2 Incoming 3 Outgoing 3 Data streaming in computer networking - MPDS 2003
Traditional router functions Switching Out2 Out3 Incoming 1 Outgoing 1 Out3 Incoming 2 Outgoing 2 Out1 Out2 Incoming 3 Outgoing 3 Data streaming in computer networking - MPDS 2003
Traditional router functions Scheduling Incoming 1 Outgoing 1 Flow 1 Flow 2 Incoming 2 Outgoing 2 Flow 3 Incoming 3 Outgoing 3 Data streaming in computer networking - MPDS 2003
Traditional router functions Scheduling Incoming 1 Outgoing 1 Flow 1 Flow 3 Flow 2 Incoming 2 Outgoing 2 Incoming 3 Outgoing 3 Data streaming in computer networking - MPDS 2003
Rules of the game • Wire speed processing • At 40 gigabits/s 8 nanoseconds per packet - need fast SRAM • Limited SRAM (say 32 megabits) but millions of flows • What does this mean for algorithms? • Low worst case complexity bounds • Low bounds on the amount of memory used • Differences from databases • One pass vs. multiple passes • Worst case vs. average case • Small constants vs. asymptotic complexity Data streaming in computer networking - MPDS 2003
Talk structure • Traditional streaming in networking • Rules of the game • Iteration paradigm: packet scheduling example • New streaming problems • Detecting malicious traffic • Understanding network workloads Data streaming in computer networking - MPDS 2003
Iteration paradigm • Many networking algorithms use iteration in time • Way to allow multi-pass algorithms without storing input by assuming inputs do not change quickly • Many examples (MULTOPS for DoS detection [Gil01], CSFQ for scheduling [Stoica98]) • Would be nice to formalize tradeoff between quality of results and drift rate of input Data streaming in computer networking - MPDS 2003
Example: Core Stateless FQ R R If R>F drop with probability 1-F/R Iteratively compute fair share F R Mark rate R Data streaming in computer networking - MPDS 2003
Talk structure • Traditional streaming in networking • Rules of the game • Iteration paradigm: packet scheduling example • New streaming problems • Detecting malicious traffic • Understanding network workloads Data streaming in computer networking - MPDS 2003
New streaming problems • Detecting malicious activity • Flooding (denial of service attacks) • Worms • Scans looking for vulnerable servers • Understanding workloads • Billing • Planning network growth • Application mix Data streaming in computer networking - MPDS 2003
Detecting malicious traffic • Well defined building blocks • Detecting large aggregates • Similar to iceberg queries • Counting active flows in an aggregate • Similar to counting distinct values • Many open problems: e.g. detect worms and DoS attacks (not clear what is right formal problem statement) Data streaming in computer networking - MPDS 2003
Talk structure • Traditional streaming in networking • Rules of the game • Iteration paradigm: packet scheduling example • New streaming problems • Detecting malicious traffic • Understanding network workloads Data streaming in computer networking - MPDS 2003
Informal problem definition Analysis Traffic reports Applications: 50% of traffic is Kazaa Sources: 20% of traffic comes from Steve’s PC Terabytes of measurement data Data streaming in computer networking - MPDS 2003
Informal problem definition Analysis Traffic reports 20% is Kazaa from Steve’s PC 50% is Kazaa from the dorms Terabytes of measurement data Data streaming in computer networking - MPDS 2003
Formal problem definition • Define clusters: • Atoms:fields 1 to n with hierarchies in each field including * • Cluster: intersection of one set from each field hierarchy • Example: Source=*, Destination=CS Net, App= Email • Threshold clusters: • Report traffic clusters above threshold T (e.g. 1% of traffic) • Omit redundant clusters: • Compression rule: remove general clusters from report when its traffic can be inferred (up to error T) from on non-overlapping more specific clusters Data streaming in computer networking - MPDS 2003
Solution status • The good: • Offline tool AutoFocus; SIGCOMM 2003 paper • Detected worm, busy servers, squid cache, etc. • Network managers like it • The bad: • Takes long: 3 hours at T=0.5% for one day trace • Needs much memory 300 Mbytes • The wanted: • Streaming algorithm - we invite improvements Data streaming in computer networking - MPDS 2003
Conclusions • New rules: strict constraints on algorithms running in routers • Iteration in time: can give simple algorithms, but needs more formalization as to quality of results • General open problems: many challenges in detecting malicious traffic such as worms and DoS attacks • Specific open problem: computing traffic cluster reports in streaming fashion Data streaming in computer networking - MPDS 2003
Thank you! Algorithms ? Databases Networking Data streaming in computer networking - MPDS 2003
Unidimensional clusters 40 35 15 35 30 160 110 75 10.8.0.2 10.8.0.3 10.8.0.4 10.8.0.5 10.8.0.8 10.8.0.9 10.8.0.10 10.8.0.14 Data streaming in computer networking - MPDS 2003
Unidimensional clusters 500 10.8.0.0/28 10.8.0.0/29 10.8.0.8/29 120 380 10.8.0.0/30 10.8.0.4/30 10.8.0.8/30 10.8.0.12/30 50 70 305 75 10.8.0.10/31 10.8.0.2/31 10.8.0.4/31 10.8.0.8/31 10.8.0.14/31 50 70 270 35 75 40 35 15 35 30 160 110 75 10.8.0.2 10.8.0.3 10.8.0.4 10.8.0.5 10.8.0.8 10.8.0.9 10.8.0.10 10.8.0.14 Data streaming in computer networking - MPDS 2003
Unidimensional clusters 500 10.8.0.0/28 10.8.0.0/29 10.8.0.8/29 120 380 10.8.0.0/30 10.8.0.4/30 10.8.0.8/30 10.8.0.12/30 50 70 305 75 10.8.0.10/31 10.8.0.2/31 10.8.0.4/31 10.8.0.8/31 10.8.0.14/31 50 70 270 35 75 40 35 15 35 30 160 110 75 10.8.0.2 10.8.0.3 10.8.0.4 10.8.0.5 10.8.0.8 10.8.0.9 10.8.0.10 10.8.0.14 Data streaming in computer networking - MPDS 2003
Unidimensional clusters 500 10.8.0.0/28 10.8.0.0/29 10.8.0.8/29 120 380 10.8.0.8/30 305 10.8.0.8/31 270 160 110 10.8.0.8 10.8.0.9 Data streaming in computer networking - MPDS 2003
Unidimensional clusters 500 10.8.0.0/28 10.8.0.0/29 10.8.0.8/29 120 380 10.8.0.8/30 305 10.8.0.8/31 270 160 110 10.8.0.8 10.8.0.9 Data streaming in computer networking - MPDS 2003
Multidimensional clusters • Two dimensions • Source network • Protocol (traffic type) • Trees turn into lattice • Multiple parents • Nodes overlap Data streaming in computer networking - MPDS 2003
Offline solution Data streaming in computer networking - MPDS 2003
Sample report Data streaming in computer networking - MPDS 2003