EaseCAM : An Energy And Storage Efficient TCAM-based IP-Lookup Architecture

EaseCAM: An Energy And Storage Efficient TCAM-based IP-Lookup Architecture V.C. Ravikumar, Rabi Mahapatra Texas A&M University; Laxmi Bhuyan University of California, Riverside

Overview • Introduction • Research Goal • Proposed approach • Results • Conclusion & Future work

Introduction Header Processing Data Hdr Data Hdr IPLookup Packet Queue IP Address Next Hop Routing Table DRAM

Introduction • HW and SW solutions for IP lookup • Software solutions unable to match link speed. • Hardware solutions can accommodate today’s link speeds • TCAMs most popular hardware device • Consume up to 15 W/chip, (4-8 chips) • Increased cooling costs and fewer ports

Current Approach • Power Reduction in TCAM • Partitioning of TCAM Array [Infocom’03, Hot Interconnect’02] • Compaction (minimization) [Micro’02] • Update techniques [Micro’02] • Routing update • TCAM updates

Bottleneck with existing approaches • Power reduction • Number of entries enabled is not bounded • Does not avoid storing redundant information • Update • Minimization techniques are not incremental • Update time is not independent of routing table size

Motivation • Solution for bounded and reduced power consumption • Truly incremental Routing and TCAM update

Contributions • A pipelined architecture for IP Lookup • New prefix properties (prefix aggregation and prefix expansion) • Upper bound on number of entries enabled (256 x 3) • Novel Page filling, memory management and incremental update techniques

Solution: Prefix properties • Prefix Aggregation 128.194.1.1/32 128.194.1.2/32 128.194.1.8/30 128.194.1.16/28 128.194.1.0/24 • 128.194.1.0/24 is the LCS for the given set of prefixes (rounded to nearest octet) • Prefixes aggregated based on LCS mostly have the same next hop • Gives a bound on the number of prefixes minimized (256)

Solution: Prefix properties TABLE I. Comparision of prefix compaction using prefix aggregation property and Espresso II for attcanada and bbnplanet router

Solution: Prefix Properties • Prefix expansion • Prefixes having same length can be minimized • To increase minimization, extend prefixes of different length to nearest octet by adding don’t-cares • Extending to nearest octet useful for incremental update 100101XX 1011011X 1011111X 1011XXXX 100101 1011011 1011111 1011

Solution: Prefix properties • Overlapping prefixes • Prefix length < 8 not present in routing table • Number of matching prefixes for IP address is ≤ 25 • Property is used to selectively enable bounded number of entries in TCAM, (256 x 3)

24 bits 1.x • . • . • . • . • . • . 2.x W1=8bits 1 2 127 128 254 255 127.x Variable Sized Segment 128.x 1st Level 254.x 255.x 2nd Level Solution: Architecture • 2 level architecture, w1 bits in 1st level and 32-w1 in 2nd level • Segment size corresponding to 1st w(8) bits is variable • Power bounded by segment size Segmented Architecture for routing lookup using TCAM.

Solution: Architecture • Memory Compaction • Apply prefix properties to remove redundancies • Apply pruning, prefix aggregation and minimization in succession • Put all prefixes < w1 into bucket (Rarely occurring prefixes) Total number of entries after compaction

Solution: Architecture • Paged TCAM architecture • Group the prefixes of length > w1 based on their LCS • The LCS values (cubes) that coverthe prefixes • The cubes now correspond to the page id • Prefixes covered by cube are stored in actual pages (Pages formed using LCS as page-id can result in under-utilization)

( 32 - w ) bits 1 IP address 32 bits Enable Line Page Table 1 . . . . . . IP address . Page . . I . 32 bits . . Comparator Page Page b I+1 Table I . . . . . . . Page . . I+C . . max . . . . . . . . . . . . Page Table N IP address 32 bits . a Bucket ( N * ) . . . . . Architecture Block Diagram Pages formed using LCS as page-id can result in under-utilization)

101 10* 100 How to avoid Under-utilization? • LCS aggregation • Aggregate prefixes having different LCS by modifying the cube • Set page-size to optimal value – avoid too large and small pages Observe: The maximum size of page can be 256, based on the above property

Solution: Page Filling Algorithm • Page Filling Heuristics (2) • Generates cubes such that it covers maximum prefixes and page size < 256 • Aggregate the page ID’s in the page tables and store them in comparators for a 0th level lookup • Find the total memory consumed (pages, page tables and comparator) for different values of w1 • Get optimal value of w1 and page size β for which total memory is the least

Solution: Page Filling • Page filling heuristics ensures: • No page has more than β*γ entries, where γ is the page fill-factor • Number of cubes that cover all the prefixes are minimum • Total memory consumption is the least for a specific value of w1 and β

( 32 - w ) bits 1 IP address 32 bits Enable Line Page Table 1 . . . . . . IP address . Page . . I . 32 bits . . Comparator Page Page b I+1 Table I . . . . . . . Page . . I+C . . max . . . . . . . . . . . . Page Table N IP address 32 bits . a Bucket ( N * ) . . . . . Architecture Block Diagram Power Enabled blocks in EaseCAM

Solution: Architecture • Bucket • Prefixes of size < w1 are stored in bucket • Word length of bucket is 32 • Either bucket or pages are searched during each lookup in the 2nd level

Solution: Architecture • Empirical model for memory • α: fraction of total entries in the bucket • αf: bucket fill factor • γ: page fill factor • Cmax: number of page ids in the page table • N: the number of entries • Pagemax: total number of pages • βw1: represents the optimal page size • Mimimum memory requirement • = βw1* Pagemax * (32-w1)/32 + Pagemax + Pagemax/Cmax + N*α/ αf

Incremental Updates • 100s updates/sec and 10 updates/sec after routing flaps • Insertion • If length of prefix > w1, • Minimize the prefix and find the new cube • Number of prefixes minimized < 256 • Update the page table and comparator if required • Update the TCAM with changed entries • TCAM insertion time and minimization time is time bounded

Solution: Incremental Update • Deletion • Delete the prefix from TCAM • Update the page table entry and comparator if required • Total number of prefixes minimized < 256 • TCAM update time is also bounded

Solution: Incremental Update Comparision of incremental update time

Solution: Memory Management • Managing page overflow • Reason: Lower value of γ. • Pages with same cube are recomputed • Free pages available in TCAM are used • Comparators are also updated when required

Results • Power consumption per lookup bbnplanet router attcanada router

Results • Case study • Memory requirements (γ=1 and α=1) Reduction in memory requirements

Results: Access time • Pre-estimation using Cacti 3.0 on CAM structure Reduction in access time

Results: Power • Pre-estimation using Cacti 3.0 on CAM structure Reduction in power

Conclusion • Significant reduction in memory consumption based on prefix compaction • Pipelined architecture to store prefixes to achieve bounded power consumption • Efficient memory management and incremental update techniques

Future work • Apply Cacti model to TCAM structure • Identify/design low-power TCAM cell • Consider classification together with IP-lookup • Fast on-chip logic minimization • Explore parallel architectures & algorithms for IP processing.

Thank You !! Questions?

EaseCAM : An Energy And Storage Efficient TCAM-based IP-Lookup Architecture

EaseCAM : An Energy And Storage Efficient TCAM-based IP-Lookup Architecture

Presentation Transcript

ip lookup

Storage Class Memory Architecture for Energy Efficient Data Centers

Data Structure Optimization for Power-Efficient IP Lookup Architectures

An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup

Space-Efficient TCAM-based Classification Using Gray Coding

An On-Chip IP Address Lookup Algorithm

IP-Lookup and Packet Classification

An Energy-Efficient Architecture for DTN Throwboxes

Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates

High-performance TCAM-based IP Lookup Engines

A Load-Balanced Pipeline Architecture for IP Route Lookup

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs

Power Efficient IP Lookup with Supernode Caching

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

Energy-Efficient Storage Systems

MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup

FRUGAL IP LOOKUP BASED ON A PARALLEL SEARCH

Parallel IP Lookup using Multiple SRAM-based Pipelines

Space-Efficient TCAM-based Classification Using Gray Coding

An Efficient IP Address Lookup Algorithm Using a Priority Trie

IP Address Lookup