1 / 35

Towards a Packet Classification Benchmark

Towards a Packet Classification Benchmark. ARL Current Research Talk 20 October 2003. Packet Classification Example. Data services: Reserved bandwidth AES security VLANs. Multi-Service Routers: Filter databases updated manually or automatically based on service agreements

lotta
Download Presentation

Towards a Packet Classification Benchmark

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a Packet Classification Benchmark ARL Current Research Talk 20 October 2003

  2. Packet Classification Example • Data services: • Reserved bandwidth • AES security • VLANs • Multi-Service Routers: • Filter databases updated manually or automatically based on service agreements • Services applied based on classification results Query: Packet from 12.34.244.1 going to 168.92.44.32 using TCP from port 1200 to port 1450 Result: Decrypt all packets using AES; Transmit packet on port 3 Query: Packet from 12.34.244.1 going to 168.92.44.32 using TCP from port 1200 to port 1450 Result: Encrypt packet using AES; Send copy of header to usage accounting with userID 110; Transmit packet on port 5

  3. Formal Problem Statement • Given a packet P containing fields Pj and a collection of filters F with each filter Fi containing fields Fij, select the highest priority exclusive filter and k highest priority non-exclusive filters where for each filter: For all j: Fij matches Pj • Performance tradeoffs commonlycharacterized by point locationproblem in computational geometry • For n regions defined in j dimensions, for j > 3, a point may be located in multi-dimensional space in O(log n) time with O(nj) space; or O(logj-1n) time with O(n) space Example: n = 13, j = 2 Packet header maps to point in 2-D space Destination Address Source Address

  4. Motivation for a Benchmark • No benchmark currently exists in industry or research community • Performance of two most effective packet classification solutions depends on the composition of filters in the filter set • TCAM capacity depends on port range specifications • Range conversion to prefixes may cause a single filter to occupy [2(w-1)]k TCAM slots (900 slots in the worst case for TCP & UDP source/destination ports) • w = number of bits required to represent a point in the range • k = number of fields specified by ranges • Observed expansion factors range from 40% to 520% • Fastest algorithms leverage heuristics and optimize average performance • Cutting algorithms (E-TCAMs, Hi-Cuts, Hyper-Cuts) • Tuple-Space algorithms • Plethora of new packet classification products • Network processors, packet processors, traffic managers, TCAMs • Intel, IBM, Silicon Access, Mosaid, IDT (Solidium), SiberCore, Cypress, etc.

  5. Motivation for a Benchmark (2) • Security and confidentiality concerns limit access to “real” databases for study and performance evaluation • Well-connected researchers have gained access but are unable to share • Lack of large “real” databases due to limited deployment of high-performance packet classification solutions • Performance evaluations with “real”databases limited by size and structure of samples • Goal: develop a benchmark capable of capturing relevant characteristics of “real” databases while providing structured mechanisms for augmenting database composition and analyzing performance effects • Should have value for three distinct communities: researchers, product vendors, product consumers

  6. Related Work • IETF Benchmarking Working Group (BMWG) developed benchmark methodologies for Forwarding Information Base (FIB) routers and firewalls • FIB focuses on performance evaluation of routers at transmission interfaces • Firewall methodology is a high-level testing methodology with no detailed recommendations of filter composition • Network Processing Forum has a benchmarking initiative • Produced IP lookup and switch fabric benchmarks • Thus far, only IBM and Intel have published results for IP lookup • No details or announcements re: packet classification • Performance evaluation by researchers • Most randomly select prefixes from forwarding tables and use existing protocol, port range combinations • Baboescu & Varghese added refinements for controlling the number of zero-length prefixes and prefix nesting

  7. Related Work (2) • Woo [Infocom 2000] provided strong motivation for a benchmark • Provided a high-level overview of filter composition for various environments • ISP Peering Router, ISP Core Router, Enterprise Edge Router, etc. • Generated large synthetic databases but provided few details regarding database construction • No mechanisms for varying filter composition

  8. Understanding Filter Composition • Most complex packet filters typically appear in firewall and edge router filter sets • Heterogeneous applications: network address translation (NAT), virtual private networks (VPNs), and resource reservation • Firewall filters are created manually by a system admin using standard tools such as Cisco Firewall MC • Model of filter construction: specify communicating subnets, specify application (or set of applications) • TCP and UDP identify applications via 16-bit port numbers • Provide services to unknown clients via “contact ports” in the range of well-known (or system) ports assigned by IANA • Since 1993, the system port range is [0:1023] • Established sessions typically use a unique port in the ephemeral port range [1024:65535] • IANA manages a list of user registered ports in the range [1024:49151] • Limited number of protocols in use, dominated by TCP and UDP

  9. Analyzing Database Structure • Engaged in an iterative process of analyses in order to identify useful metrics • Accurately capture database structure • Goal: identify methods and metrics useful for constructing synthetic databases • Defined new metrics • Joint address prefix length distributions • Scope: metric used to assess the specificity of filters on a logarithmic scale • Skew: metric used to assess the number of subnets covered by a given filter set • Quantifies branching in the binary tree representation of address prefixes

  10. Scope Definition • From a geometric perspective, a filter defines a region in 5-d space • Volume of the region is the product of the 1-d “lengths” specified by the filter fields • e.g. Number of addresses covered by source address prefix • Points in 5-d space correspond to packet headers • Filter properties are commonly defined as a tuple specification, or a vector with fields: • t[0], source address prefix length, [0…32] • t[1], destination address prefix length, [0…32] • t[2], source port range width, [0…216] • t[2], destination port range width, [0…216] • t[4], protocol specification, Boolean [specified, not specified]

  11. Scope Distributions • Scope distribution characterizes the specificity of filters in the database • Exact match filters have scope = 0 • Default filters have scope = 104 • Notable “spikes” near low end of distribution • Wide variance

  12. Joint Prefix Length Distributions • Observe large spikes in joint distribution along the “edges” • Unlike forwarding tables /0 and /32 prefixes are common in prefix length pairs • Strong motivation for capturing joint distribution • Observe a correlation with port range specifications (not shown)

  13. Joint Prefix Length Distributions (2) • For synthetic database generation, we want to: • Select a prefix length pair based on total prefix length • Total length specified by diagonals in joint distribution • Allow distribution to be modified • Represent joint distribution by a collection of 1d distributions • Build a total length distribution [0…64] • bin = sum of prefix lengths • For each non-empty bin in total length distribution, build a source length distribution for the prefix pairs in the bin • (destination address prefix length) = (total length) – (source address prefix length) • Allows for high-level input parameter for address scope adjustment

  14. Skew Definition • Want a high-level characterization of address space coverage by filters, (also want to anonymize IP addresses) • Complete, statistical model is infeasible • Imagine a binary tree with a branching probability for each node • Employ a suitable approximation to capture important characteristics such as prefix containment • Build two binary trees from the source and destination address prefixes in the filters • At each node, define the weight of the left child and right child as the number of filters specifying a prefix reached by taking the left child and right child, respectively • Let heavy = max[weight of left child, weight of right child] • Let light = min[weight of left child, weight of right child]

  15. Skew Distributions • For each level in the tree compute the average skew for the nodes at that level • Low skew  evenly “weighted” children, doubling of address space coverage • High skew  asymmetrically “weighted” children, containment of address space coverage • Skew = 1 means a node has a single path

  16. Designing a Flexible Benchmark • Provide mechanism for defining database structure • Structure could be based on analysis of seed databases • Construct a set of benchmark database structures to use a departure point for performance evaluation • Provide high-level controls for augmenting database structure • Observe effects on search and capacity performance • Scale the database while preventing redundant filters • Adjust the specificity or scope of filters • Introduce “entropy” into the database • A structured mechanism for straying from database structure • Difficult to provide meaningful adjustments for application specifications (protocol, port ranges)

  17. Benchmark Architecture

  18. Parameter Files • Defines the general database via requisite statistics • May be extracted from seed databases using an analysis tool • Goal: compile a set of benchmark parameter files that characterize various packet classification application environments (as proposed by Woo) • Protocol and port pair class distribution • Distribution of protocol specifications • For each protocol, specify a port pair class distribution for filters specifying the given protocol • Port pair class defines the structure of port range pairs • 25 port pair classes all possible permutations of five port classes • WC = [0:65535], WR1 = [0:1023], WR2 = [1023:65535], AR, EM • Port range distributions • Arbitrary range and exact port distributions • Limited set of arbitrary ranges observed in real databases

  19. Parameter Files (2) • Joint prefix length distributions for each “port pair class” • 25 distributions, each containing a total length distribution and the associated source address prefix length distributions • Preserves correlation between port pair class and prefix length pairs in directional filters • Address skew distributions for source and destination addresses • Source/destination prefix “correlation” distribution • Specifies the “distance” between communicating subnets specified by filter • Probability that the address prefixes of a filter continue to be identical at a given prefix length • Consider a filter with address prefix length pair (16,25) • Consider walking the source and destination address prefix trees in parallel • Assume that the prefixes are identical for the first 8 bits • The “correlation” probability at level 9 specifies the probability that the next bit in the prefixes will be the same • Once prefixes diverge or prefix length is reached, the distribution is irrelevant

  20. Synthetic Database Generator • Reads in parameter file • Trivial option to generate a completely random filter database • Takes three high-level input parameters • size = target size for synthetic database • Resulting size may be less than target • Tool generates filters using statistical model then post-processes database to remove redundant filters • Favorable for assessing scalability of parameter files • Smoothing (r) = number of bits by which synthetic filters may stray from points in prefix length pair distribution • Structured “entropy” mechanism for introducing new prefix length pairs • Models aggregation and/or increased flow segregation • Scope (s) = bias to more or less specific filters • Adjusts the shape of the address length distributions without adding or removing bins

  21. Understanding Scaling Effects • Readily scale a seed database by 30x to 40x • Larger seed databases provide for larger synthetic databases • rules6 (~1500 filters) is approximately 6x larger than rules1 and rules5 • As the “limit”of the seed parameter file is reached  shift in average filter scope to more specific filters

  22. Smoothing Adjustment • Smoothing (r) = number of bits by which synthetic filters may stray from points in prefix length pair distribution • Apply a symmetric binomial spreading to each spike in the joint prefix length distribution • For each joint distribution in parameter file: • Apply binomial spreading to each spike in total length distribution • For each source prefix length distribution: • Apply binomial spreading to each spike in source length distribution • Tricky details like adjusting the width of the source spreading as you move away from the original spike • Truncate and normalize distribution to allow for spreading of spikes at the edges • Let k = 2r

  23. Smoothing Example: Single Spike • All prefixes lengths are 16 bits • Database target size = 64,000 filters • No scope adjustment, s = 0 • Generate databases for various values of smoothing adjustment, r (a.) r = 0 (b.) r = 0, top-view

  24. Single Spike with r = 8 • r = 8  maximum Manhattan “distance” from original spike • Observe symmetric binomial distribution across total prefix length (diagonal) and source prefix length (a.) r = 8 (b.) r = 8, top-view

  25. Single Spike with r = 32 • r = 32  maximum Manhattan “distance” from original spike • Observe symmetric binomial distribution across total prefix length (diagonal) and source prefix length (a.) r = 32 (b.) r = 32, top-view

  26. Smoothing with Seed Parameter File • r = 16 • Appears to be the sensible limit to smoothing for real databases • Spreading is cumulative, adjacent spikes may spread into each other creating new dominant spikes

  27. Understanding Smoothing Effects • High sensitivity for small values of smoothing adjustment, r • Believe that this is due to dominance of spikes at the “more specific” edges of the joint distributions in seed databases • Truncation causes a slight drift to a larger average scope

  28. Smoothing: Contrived Distributions • Constructed two contrived distributions to verify hypothesis • Spikes = all joint distributions have two points (0,0) and (32,32) • Uniform = uniform total length distribution • Observed identical drift for spikes distribution and no drift for uniform distribution

  29. Scope Adjustment • Scope (s) = bias to more or less specific filters, [-1:1] • Adjusts the shape of the address length distributions without adding or removing bins • s > 0 : decrease scope, increase specificity (prefix length) • s < 0 : increase scope, decrease specificity (prefix length) • Utilize a bias function on the random number used to select from the cumulative distributions • Bias function computes area under line whose slope is defined by s • Prevents laborious recomputation of each prefix length distribution s = 1 s = -1 s = 0 1 1 1 0.5 0.75 0.25 0 0.5 1 0 0.5 1 0 0.5 1 S = -1 S = 0 S = 1

  30. Scope Example: Uniform Distribution • Uniform distribution, r = 0, s = 1 • Weight is pushed to more specific address prefixes

  31. Scope: Contrived Distributions • Maximum bias of ~12-bits longer or shorter in total prefix length • Provides for an 4096x increase or decrease in the average coverage of the filters in the database • As expected, negligible difference in two distributions • No change in bins, only a shift in weight

  32. Scope: Real Distributions • Observed maximum bias of ~ 6-bits longer or shorter in total prefix length • Provides for an 64x increase or decrease in the average coverage of the filters in the database • Sensitivity is dependent upon parameter file

  33. Synthetic Database Generation Summary • Solid foundation for a packet classification benchmark • May be beneficial to have a high-level skew adjustment or skew compensation coupled with scaling • Allow more branching for larger databases • Need more sample databases from other application environments in order to compile benchmark suite of parameter files • Alternately, formulate parameter files manually from more detailed extensions of Woo’s descriptions

  34. Trace Generator • Problem: given a filter database, construct an input trace of packet headers that query the database at all “interesting” points and an associated output trace of best-matching (or all-matching) filters for each packet header • We can define “interesting” in various ways… • A point in each 5-d polyhedron formed by the intersections of the 5-d rectangles specified by the filters in the database (optimal solution) • Appears to be an O((n*log n)5) problem using fancy data-structures • Optimizations may exist and amortized performance may be better • A random selection of points (least favorable solution) • A pseudo-random selection of points (most feasible solution?) • For each filter, chose a few random points covered by the filter • Might be able to develop some heuristics to choose points that are and are not likely to be overlapped by other filters • Post-process the input trace in order to generate the output trace • Could feedback results of post-process in order to choose points for filters not appearing in the output trace

  35. The next step… • Finalize trace generator design, implement, and analyze (if necessary) • Run several packet classification algorithms through the benchmark • Use results to refine tools and develop benchmarking methodology that extracts salient features • Investigate ways to generate broad interest in the benchmark • Publication • Web-based scripts • Pitch to the IETF • Comments, critiques, suggestions, questions?

More Related