260 likes | 429 Views
Francis Chang Wu-chang Feng Wu-chi Feng Kang Li. Efficient Packet Classification with Digest Caches. Packet classification. Essential in all network devices Routers, firewalls, NATs, Diffserv/QoS markers, etc. But, complexity increasing Number of rules Number of fields to classify
E N D
Francis Chang Wu-chang Feng Wu-chi Feng Kang Li Efficient Packet Classification with Digest Caches
Packet classification • Essential in all network devices • Routers, firewalls, NATs, Diffserv/QoS markers, etc. • But, complexity increasing • Number of rules • Number of fields to classify • Size of header (IPv6) • Number of flows
Packet classification • Performance-bound by memory • Must store and access large headers and many rules quickly • Lookup algorithms perform better when given more memory • Classic space-time trade-off in performance • Supporting line speeds requires a large amount of fast memory • Fast memory expensive • Large memory slow
Probabilistic Networking • Goal of work • Throw a wrench into space-time trade-off • Examine a third axis: accuracy • Reduce memory requirements by relaxing the accuracy of packet classification function What are the quantifiable benefits of sacrificing accuracy on the packet classification function?
What? Willingly make mistakes? • Sure • Packet errors and lack of reliability are a fact of life • Masked by application layer or ignored • Lots of packets are bad, some are undetectably bad [Stone00] • TCP • 1 in 1100 to 32000 TCP packets fail checksum • 1 in 16 million to 10 billion TCP packets are UNDECTABLY bad • UDP • UDP packets are not required to have valid cksum • Even if the cksum is bad, OS will give the packet to the application (Linux) • Routing problems occur frequently • Transient loops [Hengartner02] • Outages
Several places to apply idea… • Full classification • Exact multi-dimensional solutions still too costly [Baboescu03] • Inaccuracy may help • Work in progress… • Classification caches • Space requirements grow linearly with number of flows and fields • Use lossy recall in remembering previous classification decisions to reduce cache size • Our current work..
Initial approach • Bloom filter • Approximate data structure to store flows matching a binary predicate • Spell checkers • Browser and web caches • How it works • L х N array of memory • Addressed by L independent hash functions • Each function addresses N buckets • Storing new flows • Set bits corresponding to the results of L hash functions on header • Looking up flows • Check bits corresponding to the results of L hash functions on header • Collisions in filter cause inaccurate classifications Francis Chang, Wu-chang Feng, Kang Li, “Approximate Caches for Packet Classification”, in Proceedings of INFOCOM ’04, March 2004.
hL-1 h0 h1 1 0 1 Flow insertion 2 1 0 Unknown flow 0 N-1 1 Bloom filter NL virtual bins out of L*N actual bins
The value of making mistakes • Initial results promising • Small, high-performance caches with 1 in a billion error rate • Storage capacity invariant to header size and fields • Size of approximate cache determined by number of flows to store and desired accuracy • Size of exact cache determined by number of flows to store and header size and fields • IPv4-based connection identifier = 13 bytes • IPv6-based connection identifier = 37 bytes
But… • Some glaring disadvantages • Large number of levels and memory lookups required • Not amenable to most NP architectures • Requires hardware support and parallel, bit-level memory addressing • Aging properties • Can not gracefully age cache • No selective replacement policies possible (i.e. LRU) • Must periodically expunge entire cache • Results in large variance in full classifications required
New approach • Digest caches • Use a traditional cache architecture • Store and use a digest of classification fields instead of full header(s)
Digest caches • How it works • Upon full classification of packet header fields (P) • Calculate h1(P) and h2(P) • Use h1(P) to select cache line • Insert h2(P) and classification result into cache line • Subsequent packets • Calculate h1(P) and h2(P) • Use h1(P) to select cache line • Lookup h2(P) in cache line • If match, follow cached result • If no match, perform full classification • Misclassification caused by hash-signature collisions • Increases as the number of bits in digest decreases (c) • Increases as the associativity of cache increases (d)
Digest caches • Fixes all of the problems of Bloom filter caches • Less memory accesses • NP-friendly • Does not require parallel, bit-addressable memory access • Can alleviate need for associative hardware (more later) • Gracefully ages • Can smoothly remove old entries
Evaluation • Trace-driven simulation • PCCS simulator (http://pccs.sourceforge.net) • Packet traces • Bell Labs trace • One hour trace at Bell Labs Research in May 2002 • OGI trace • One hour trace of OGI’s OC-3c link on July 26, 2002
Choosing associativity • Experiment • Fixed misclassification probability of 10–9 • Variable bit digest based on associativity • Results similar to previous studies • Small amount of associativity ideal for performance
Comparing approaches • Digest cache • 32-bit digests • 4-way associativity • Bloom filter cache • Optimal, 30-level filter • Exact caches • IPv4 and IPv6 flow caches
NP implementation • IXP1200, L3Forwarder • 4-way associative digest cache • 803Mbs • Bloom filter cache • 1 level = 990 Mbs • 4 levels = 652 Mbs
A final note to those who hate being wrong • Can be used to accelerate exact caches • Consider • Exact cache where where associativity is emulated • Entire cache line must be read sequentially to find match • Digest cache acceleration • Use smaller, digest cache stored in fastest (possibly associative) memory to mirror entries in exact cache • Lookup in digest cache gives exact location of relevant entry in exact cache • Good for implementing associative caches on NPs that do not have hardware support • Speed-up analysis in paper
Misclassification rates for digest caches 4-way associative digest cache