1 / 29

Algorithms for Advanced Packet Classification with TCAMs

Algorithms for Advanced Packet Classification with TCAMs. Karthik Lakshminarayanan UC Berkeley Joint work with Anand Rangarajan and Srinivasan Venkatachary (Cypress Semiconductors Inc.). Packet Processing Environment. Rule: acl-id src-addr src-port dst-addr dst-port proto

stacey
Download Presentation

Algorithms for Advanced Packet Classification with TCAMs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for Advanced Packet Classification with TCAMs Karthik Lakshminarayanan UC Berkeley Joint work with Anand Rangarajan and Srinivasan Venkatachary (Cypress Semiconductors Inc.)

  2. Packet Processing Environment Rule: acl-id src-addr src-port dst-addr dst-port proto (e.g. acl1231128.32.0.0/80-102332.12.1.1/161024tcp) Rule Action Hdr Payload permit/deny, update counter … … Search Key ACL Database • Packet matches a set of rules based on the header • Examples: routers, intrusion detection systems

  3. Ternary Content Addressable Memory 00100x1x001110x0x Search key 01110xxx001100xxx 011101xx001100x10 width = W bits 1111101x1101000xx width = W bits • Memory device with fixed width arrays • Each bit is 0, 1 or x (don’t care) • Search is performed against all entries in parallel and the first result is returned TCAM row1 row2 Output is row2 … rown

  4. TCAM: Benefits and Disadvantages • Benefits: • Deterministic Search Throughput—O(1) search • Disadvantages: • Cost • Power consumption • Current TCAM usage: • 6 million TCAM devices deployed • Used in multi-gigabit systems that have O(10,000) rules • TCAMs can support a table of size 128K ternary entries and 133 million searches per second for 144-bit keys

  5. Problems • Range Representation Problem • Multimatch Classification Problem Algorithms in software easier to implement

  6. Problems • Range Representation Problem • Multimatch Classification Problem

  7. Range Representation Problem • Representing prefixes in ternary is trivial • IP address prefixes present in rules • Representing arbitrary ranges is not easy though • port fields might contain ranges • e.g. some security applications may allow ports 1024-65535 only

  8. Earlier Approaches – I Prefix expansion of ranges: • express ranges as a union of prefixes • have a separate TCAM entry for each prefix • expansion: the number of entries a rules expands to • Example: the range [3,12] over a 4-bit field would expand to: • 0011 (3), 01xx (4-7), 10xx (8-11) and 1100 (12) • Worst-case expansion for a W-bit field is 2W-2 • example: [1,14] would expand to 0001, 001x, 01xx, 10xx, 110x, 1110 • 16-bit port field expands to 30 entries

  9. Earlier Approaches – II Database-dependent encoding: • observation: TCAM array has some unused bits • use these additional bits to encode commonly occurring ranges in the database • TCAMs with IP ACLs have ~ 36 extra bits • 144-bit wide TCAMs • 104-bits + 4-bits for IP ACL rules

  10. Earlier Approaches – II Database-dependent encoding: • observation: TCAM array has some unused bits • use these additional bits to encode commonly occurring ranges in the database • Example: Address Port … 12.123.0.0/16 20-24 … 32.12.13.0/24 1024- … 128.0.0.0/8 20-24 … Set extra bit to 1 Set extra bit to x Set extra bit to 1 If search key falls in 20-24, set extra bit to 1, else set it to 0

  11. Earlier Approaches – II Database-dependent encoding: • observation: TCAM array has some unused bits • use these additional bits to encode commonly occurring ranges in the database • Improved version:Region-based Range Encoding • Disadvantages: • database dependent  incremental update is hard

  12. Database-Independent Range Pre-Encoding • Key insight: use additional bits in a database independent way • wider representation of ranges • reduce expansion in the worst-case

  13. Database-Independent Range Pre-Encoding • Fence encoding (W bits): • total of 2W-1 bits • encoding of i has i ones preceded by 2W-i-1 zeros • e.g. W=3, f(0) = 0000000, f(2) = 0000011 • With 2W-1 bits, fence encoding achieves an expansion of 1 Theorem: For achieving a worst-case row expansion of 1 for a W-bit range, 2W-1 bits are necessary

  14. Database-Independent Range Pre-Encoding • Procedure: • split W-bit field into multiple chunks • encode each chunk using fence encoding • “combine” the chunks to form ternary entries k0 bits k1 bits k2 bits W bits Combining chunks: analogous to multi-bit tries

  15. Unibit view of DIRPE (Prefix expansion) • W=3 divided into 3 one-bit chunks • R=[1,6]—prefixes = {001,01x,10x,110} • Each level can contribute to at most 2 prefixes (but the top level) xxx 0xx 1xx 00x 01x 10x 11x 000 001 010 011 100 101 110 111

  16. Multi-bit view of DIRPE • Legend: • 8-bit field (W=8) • k0=2, k1=3, k2=3 • R=[11,54] • =[013, 066] • split chunk = 1 Worst case expansion = 2W/k – 1 Number of extra bits needed = (2k-1)W/k - W

  17. Comparison of Expansion Worst-case expansion Real-life expansion DIRPE + DB-dependent  Net expansion was 1.12

  18. Region-based Encoding (with r regions) DIRPE (with k-bit chunks) Prefix Expansion DIRPE + Region-based Metric (2k-1) log2r F( W(2k-1) 2n-1 k F( Extra bits 0 F(log2r + ) - W) 2n-1 k ) r + r Worst-case capacity degradation 2log2r 2W )F )F - 1 ( ( (2W-2)F (2log2r)F k k Cost of an incremental update W )F ) O(( O(N) O(WF) O(N) k Pre-computed table of size: Both pieces of logic from previous two columns W.2k Overhead on the packet processor ) O( 2n-1 ) F.2W) O((log2r+ k None r logic gates ( or ) O(nF) comparators of width W bits

  19. DIRPE: Summary • Database independent • Scales well for large databases • Good incremental update properties • Additional bits needed • Small logic needed for modifying search key

  20. Problems • Range Expansion Problem • Multimatch Classification Problem

  21. Multimatch Classification Problem • TCAM search primitive: return first matching entry for a key • Multimatch requirement: return k matches (or all matches) for a key • security applications where all signatures that match this packet need to be found • accounting applications where counters have to be updated for all matching entries

  22. Earlier Approaches match Entry Invalidation scheme: • maintain state of multimatch using an additional bit in TCAM called “valid” bit TCAM array x 00100x1x001110x0x Search key x 01110xxx001100xxx 0 011101xx001100x10 1 … valid bit 1111101x1101000xx x valid bit

  23. Earlier Approaches Entry Invalidation scheme: • maintain state of multimatch using an additional bit in TCAM called “valid” bit • Disadvantage: • ill-suited for multi-threaded environments • need one bit for each thread

  24. Earlier Approaches Geometric intersection scheme: • construct geometric intersection (cross-products) of the fields and place in TCAM • pre-processing step is expensive • search is fast • Disadvantage: • does not scale well in capacity • for router dataset: expansion of 25—100

  25. Multimatch Using Discriminators (MUD) • Observation: after index j is matched, the ACL has to be searched for all indices >j • Basic idea: • store a discriminator field with each row that encodes the index of the row • to search rows with index >j, the search key is expanded to prefixes that correspond to >j • multiple searches are then issued

  26. MUD: Example match TCAM array rule0 0000 001x 01xx 1xxx Search key rule1 0001 xxxx 011101xx00 rule2 0010 … discriminator discriminator field

  27. Geometric Intersection-based Entry Invalidation MUD Metric Multi-threading support Yes Yes No Worst-case TCAM entries for N rules N O(NF) N O(N) O(NF) Update cost O(N) 1 + d + (d-1)(k-2) Cycles for k multi-matches 7k k with DIRPE: 1 + d(k-1) r without DIRPE: d Extra bits 0 0 with DIRPE: log2(d/r) + (d-r) + (2r-1) Small state machine logic; can be implemented using a few hundred gates or a few microcode instructions Small state machine logic; can be implemented using a few hundred gates or a few microcode instructions Overhead on the packet processor None

  28. MUD: Summary • No per-search state—multi-threaded • Incremental updates fast • Scales well to large databases • Additional bits needed • Extra search cycles • Can still support Gbps speeds

  29. Conclusion • Range expansion problem: proposed DIRPE, a database independent encoding • scales to large number of ranges • good incremental update properties • Multimatch classification problem: MUD • suitable for multithreaded environments • scales to large databases • No change to hardware  easy to deploy

More Related