540 likes | 661 Views
Oral Qualifying Examination David V. Schuehler. Papers reviewed: Packet Classification on Multiple Fields Gupta and McKeown Scalable Packet Classification Baboescu and Varghese What Packets May Come: Automata for Network Monitoring Bhargavan, Chandra, McCann and Gunter Protocol Boosters
E N D
Oral Qualifying ExaminationDavid V. Schuehler Papers reviewed: Packet Classification on Multiple Fields Gupta and McKeown Scalable Packet Classification Baboescu and Varghese What Packets May Come: Automata for Network Monitoring Bhargavan, Chandra, McCann and Gunter Protocol Boosters Feldmeier, McAuley, Smith, Bakin, Marcus and Raleigh
Services Provided by Packet Classifiers • Packet Filtering • Policy Routing • Accounting & Billing • Traffic Rate Limiting • Traffic Shaping
Network Monitoring • Troubleshoot problems • Analyze performance • Validate correctness of operations • Data gathering • Network tuning
Heterogeneous Internet Fiber Optic Copper Wireless Satellite
First Paper • Packet Classification on Multiple Fields • Pankaj Gupta and Nick McKeown • Computer Systems Laboratory • Stanford University • Published in SIGCOMM 1999 • August, 1999 • Cambridge, MA
Challenge • Develop a high performance packet classifier • Exploit structure and redundancy found in existing classifier rule sets
Analysis of 793 Classifiers from 101 ISPs • 41,505 total rules • Small rule sets • 99% contained < 1000 rules, mean of 50 rules • Filter on maximum of 8 fields • src/dst addr, src/dst port, TOS, protocol, flags • Small number of protocols filtered • 10% contain ranges • 14% contain non-contiguous mask • Ex. 137.98.217.0/8.22.160.80 • Duplication found in rule field specifications • 8% or rules were redundant
Structure of Classifiers • Small amount of rule intersection in existing classifiers • For 1734 rules in 4 dimensions, found 4316 overlapping regions – worst case is 1013
Recursive Flow Classification (RFC) Perform mapping from packet header fields to classification ID in multiple phases Each phase consists of multiple parallel lookups Each lookup is a reduction in bit length
RFC Performance Tuning • Number of phases • Time (# of lookups) • Reduction tree selected • Space (memory utilization) • Tuning operation • Select number of phases • Combine chunks with most correlation • Combine as many chunks as possible • Tree A is optimal
Memory – Time Tradeoff 2 Phases: < 10GBytes 3 Phases: < 2.5MBytes 4 Phases: < 1.1MBytes
Software Performance • 333Mhz Pentium-II (Windows NT) • Worst case time double that of average • Average time for 100,000 classifications
Adjacency Groups • Combine rules which contain differences in one dimension, but are otherwise identical • Loose knowledge of which rule packet matched • Additional preprocessing work required • Reduces the total number of rules • Handles 15,000 rules in 3.85 MB
Summary • Exploit structure & redundancy in rules • Recursive Flow Classification (RFC) • 1 million packets/sec in S/W • 30 million packets/sec in H/W • Supports < 6000 rules, < 15,000 with Adj Grp • Utilizing knowledge of rule set to reduce complexity • Combine rules (adjacency groups) to reduce the number of chunk equivalence classes • Hardware performance optimistic • Problems with small number of phases and large rule sets
Second Paper • Scalable Packet Classifications • Florin Baboescu & George Varghese • Dept. of Computer Science & Engineering • University of California, San Diego • Published in SIGCOMM 2001 • August, 2001 • San Diego, CA
Challenge • Develop a high performance packet classifier that supports large rule sets (100,000 rules) • Exploit structure and redundancy found in existing classifier rule sets • Extend Bell Labs/Lucent Bit Vector search algorithm
Lucent Bit Vector • Point location in multi-dimensional space • Parallel lookups for each dimension • Bit vector generated for each field (dimension) • Take intersection of result vectors • Search is linear with respect to number or rules • Scales to 10,000 rules
Lucent Bit Vector (continued) Max 2n+1 intervals for n rules
Aggregate Bit Vector • Rule Aggregation • Bit vectors are large (scale with # of rules) • Bit vectors are sparsely populated • Packets match at most 4 rules • Large rule sets created by combining smaller disjoint rule sets • Rule Rearrangement • Rearrange rules to improve aggregation • Reduce false matches • Must compute lowest cost for all matches
Rearrangement Example Aggregation size = 2 Packet from source X to destination Y Before Rearrangement 30 false matches After Rearrangement No false matches
Results Worst case memory access for 4 databases with 5 fields (A=32) Improvement: 27% - 54% unsorted 40% - 75% sorted
Multiple Levels of Aggregation Comparison of one & two levels of aggregation Zero length prefixes are injected 60% improvement for large rule set Number of memory accesses required
Summary • Add aggregation & rearrangement to Lucent Bit Vector algorithm • Order of magnitude faster than BV scheme • Suitable for large rule sets (100,000 rules) • Multiple levels of aggregation reduce memory operations for large databases • Wide memory widths improve efficiency
Third Paper • What Packets May Come: Automata for Network Monitoring • Karthikeyan Bhargavan & Carl A. Gunter • University of Pennsylvania • Satish Chandra & Peter J. McCann • Bell Laboratories • Published in POPL 2001 • Principles of Programming Languages • January, 2001 • London, UK
Challenge • Formulate an external network protocol monitor as a language recognition problem • Given a language specification of input & output sequences, develop a second that corresponds to the sequences observed externally
Complications • Observed traffic could differ from traffic observed by target • Protocol specifications are often vague • Implementations of protocols vary • Observed language could be significantly different from language that target device processes
Basic Monitor • Sequence at M iqaiqbiqciqeoqd • Sequence at S ida idb odd id iq oq od
Admissibily Given string at S: i1 i2 o1 i3 o2 i4 i5 Queue sizes: input = 3, output = 2 A: iq1id1 iq2id2od1 oq1 iq3id3 od2 oq2 iq4id4 iq5id5 B: iq1 iq2id1 id2 od1 oq1 iq3id3 od2 oq2 iq4id4 iq5id5 C: iq1 iq2 iq3id1 id2 od1 oq1id3 od2oq2 iq4id4 iq5id5 D: iq1 iq2 iq3id1 id2 od1 id3 od2 oq1 oq2 iq4id4 iq5 id5 E: iq1 iq2 iq3id1 id2 od1 iq4 iq5id3 od2 oq1 oq2id4 id5 F: iq1 iq2 iq3iq4id1 id2 od1 iq5id3 od2 oq1 oq2id4 id5
Elimination of Output Buffer • CU the maximum number of input symbols without an intervening output symbol • M(S, m, n) => M(S, m+CU*n, 0) • Example m = 2, n = 2, CU = 2 iq1 iq2od1 id1 iq3id2 iq4od2 id3iq5id4 iq6 oq1 oq2 Move iq and oq tokens as far left as possible iq1 iq2 iq3 iq4 iq5 iq6od1oq1 id1id2od2oq2 id3 id4 Maximum input buffer size = 6 (2 + 2 * 2)
Dealing with Packet Loss • CL the maximum number of dropped tokens between two id tokens must be less than CL • LM(S,m,n) => LM(S,m+CU*CL*n,0) • Example iq1il1iq2 iq3id2 od1il3oq1 • Tokens at M iq1 iq2 iq3 oq1 • Tokens at Sid2 od1
Brute Force Search • g is a function that checks S on a sequence of tokens and indicates whether it is in LS • F(g,T) is a function that tells us whether trace T corresponds to proper execution with respect to S • Construct all possible token sequences at S based on tokens observed at M • Iterate through each sequence checking for an admissible string • If found, observed string is in LS • Otherwise, failure
No Data Loss Optimizations (CL= 1) • P1: Counting Properties • Every output must consume between cmin & cmax inputs • P2: Independent Inputs and Outputs • Validate input and output sequences separately • P3: Periodic Outputs • Output is produced every P inputs • P4: Deterministic Placement of Outputs • One position for output after sequence of inputs • P5: Contiguously Enabled Commutative Outputs • Output is valid for a contiguous range of inputs • P6: Output-checkpointed Automata • For each output, there is at most one next state • P7: Finite State Machines • If g is FSM, BFS has polynomial bound in # of states & size of buffers (|T| * B2)
Optimizations with Data Loss • P1*: Counting Properties • Buffer limit becomes m + cmax * CL * (n + 1) • P2o: Independent Output Properties • Same as no loss case • P8: Insert-closed Commutative Outputs • If string is accepted, so is string with arbitrary inputs added • P7*: Finite State Machines • Still bounded, but must consider 2B lossy substrings • P9: Deterministic Stateless Transducers • Stateless automata where all inputs are distinct • P10: Output-checkpointed Stateful Transducers • Unique state after consuming odx • P6*: Output-checkpointed Automata • Check maximum of 2(B+CU*CL) strings against g at output
Complexities P1: Counting (P6, P7) P2: Independent In & Outputs (P5) P2o: Independent Outputs (P2, P8) P3: Periodic Outputs (P4) P4: Deterministic Placement (P5) P5: Commutative Outputs (ALL) P6: Checkpointed Automata (ALL) P7: Finite State Machines (ALL) P8: Commutative Outputs (P5) P9: Finite State Machines (P7, P10) P10: Stateless Transducers (P4, P6) (implies)
Monitoring TCP • Property 1 describes counting property • Monitors ACKs generated for at least every other message • Property 2 describes independent inputs & outputs • Monitors non-decreasing sequence numbers • Property 3 describes periodic outputs (no loss) • Monitors ACKs generated for contiguously received set of segments
Summary • External monitor developed as a language recognition problem • Problem unbounded with respect to space & time • Properties defined to limit complexity • Impressive goal to attempt monitoring of complex protocols with finite automata • Disappointed at TCP monitoring examples • Does not account for loss of output events • Monitor should be placed close to endpoint
Fourth Paper • Protocol Boosters • D.C. Feldmeier, A.J. McAuley, J.M.Smith, D.S. Bakin, W.S. Marcus, T.M. Raleigh • Bellcore and University of Pennsylvania • Published in IEEE JSAC • Journal on Selected Areas in Communications • April, 1998
Challenge • Develop a new methodology for protocol design • Support localized customization in heterogeneous networks • Provide for rapid protocol evolution
Current Limitations with IP Internet • Protocols evolve slowly with respect to advances in networking technology • IPV6 • Multicast • Short duration connections (HTTP) • Sacrifice efficiency in order to support a large heterogeneous network • Satellite communication • Wi-Fi wireless etherent • ATM
Protocol Booster Software or hardware module that transparently improves protocol performance
One-Element Protocol Boosters • UDP checksum generation • Generate UDP checksum within network • TCP ACK compression • Compress multiple ACKs on slow link • TCP congestion control • Generate duplicate ACKs to reduce window size • TCP ARQ booster • Caches packets and performs retransmission ARQ (automatic repeat request)
Two-Element Protocol Boosters • Forward error correction coding • Add parity and correction bits • Regenerates missing data • Jitter elimination for real-time communication • Match packet arrival rate at other end • Eliminates jitter by increasing latency • TCP Selective ARQ • Cache packets add sequence numbers • Generate NACK for missing packet • Retransmit packet on receipt of NACK
Fast Evolution • No standards body • Developed by small team • Contained insertion into network • Free market supports competition and collaboration • Proprietary boosters offer competitive advantage
Targeted Improvements • Quick fix applied to individual network segments • Rapid deployment • Isolated boosters • Targeted trouble spots • Doesn’t affect other areas of the network
Comparisons to Other Approaches • Link Layer Adaptation • Only operates at link layer • Protocol Conversion • Conversion changes message syntax • Protocol Termination • Loses end-to-end properties • Special Purpose End-to-End Protocols • Cannot account for changes in network
Example Implementation Protocol boosters added to Linux & NetBSD systems Forward error correction booster implemented UDP data traffic Random and bursty error models used Booster successfully reduced effective packet loss