High-performance TCAM-based IP Lookup Engines

High-performance TCAM-based IP Lookup Engines Authors:Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present:林呈俞 Date: 2008/9/24

Outline • Introduction • Previous works • MSMB scheme • MSMB-PT scheme • MSMB-LPT scheme • Goals of this paper • Proposed works • M-MSMB-LPT scheme • MSMB-LPT-I scheme • Experimental results

Introduction (1/3) • To achieve high IP lookup performance, it has been proposed to use TCAMs to implement IP-Lookup accelerators. • One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice. • Previous works on reconfiguring a TCAM into several independent blocks. • MSMB • MSMB – PT • MSMB – LPT

Introduction (2/3) • MSMB (Multi – Selector and Multi – Block) scheme • Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible. • With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups. • One would need M parallel RDs for the this system.

Popular-Prefix Table (PT): caching some of the prefixes recently used by all inputs. Introduction (3/3) • MSMB – PT (Popular – prefix table) scheme • This scheme is based on temporal locality of packet destinations. • In order to alleviate the TCAM contention problem caused by traffic bias.

MSMB – LPT (Local PT) (1/2) • A flow is a stream of packets, for which the packets are transmitted as a bursty sequence. • For a given router R, the packets of flows arrive at same input of R exhibit bias of IP streams to a small set of IP prefixes. • For any bursty traffic period of an input of R, the bias of IP addresses is called the temporal locality of flows. • The major difference between MSMB – LPT and MSMB – PT are as follows • MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption). • LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions.

Local Popular-Prefix Table (LPT): it used to dynamically store recently referenced IP prefixes requested from input i. Contention Resolver (CR): chooses one request according to a priority scheme and passes it to TCAM. MSMB – LPT (Local PT) (2/2)

Goals of this paper • How to design a TCAM-based IP lookup engine that • improves MSMB-LPT without using more HW resources ? • satisfy given performance requirements ? • For lage m (inputs) • How to design a scalable TCAM-based IP lookup engine ? • How to find tradeoffs among cost, performance and reliability ?

Proposed work (1/5) • Definitions: • MSMB – LPT has a configuration with (m, n, k) • m input • k TCAM blocks • LPT of size n • Total number of prefixes M (each block contains M/k prefixes). • The parameters m and k are carefully selected to achieve optimized cost and performance. • Are there better MSMB schemes for given m and k ? • Two proposed schemes: • M – MSMB – LPT • MSMB – LPT – I

Proposed work (2/5) • Multiple(M) – MSMB – LPT • For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k). • input i*m’ + jas the j-th input of the (i+1)-th MSMB-LPT. m’ = m / w

Input(j-1)*m’ + 1 … Input(j-1)*m’ + 2 MSMB - LPTj k CRs and k TCAM blocks Inputj*m’ Proposed work (3/5) • Multiple(M) – MSMB – LPT • The w TCAM clocks TCAMj,u,have the same content as TCAMu in MSMB-LPT, where j = 1 ~ w. • We say that an M-MSMB-LPT has configuration (m, n, w, k). • if it has wMSMB-LPTs of configuration (m’, n, k). • In an M-MSMB-LPT scheme, w MSMB-LPTs operate completely independently.

Proposed work (4/5) • MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I) • An MSMB – LPT – I of configuration (m, n, w, k)has • m input, and the LPT of size n. • wk TCAM blocks that are partitioned into k groups, each called TCAM bundle. k bundles Input 1 Input 2 The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAMjin the MSMB-LPT scheme. Input m

Proposed work (5/5) Process runs concurrently j = 1~ k i = 1~ m ni – th key from input i • The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.

Experimental results (1/9) • We conduct a serious simulations on M-MSMB-LPTand MSMB-LPT-I. • First – in – first – out (FIFO) replacement policy is used for LPT update. • Round – rodin (RR) arbitration is used for TCAM contention resolution. • Two packet traces are used in simulations. • 1. generating accroding to routing table described in [17]. • 2. derived from actual packet flows given in [19]. • The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT. • The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k)as follows. • (m, n, 1, k) = MSMB-LPT with (m, n, k). • (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k). • Example: • MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M-MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4) # bundles # blocks

Experimental results (2/9) • Performance metrics • TCAM contention ratio • Speedup over naïve MSMB • TCAM utilization # contentions at TCAM blocks Total # key search time. Total # parallel cycles to complete IP lookup for all packets in a trace. AMSMB-LPT-I(j) : total # cycles in which TCAMj blocks is searched.

Experimental results (3/9) • Power consumption

Experimental results (4/9) • Speedup 48 TCAM blocks 16 TCAM blocks

Experimental results (5/9) • Power consumption

Experimental results (6/9) • Contention ratio • 36 inputs and 4 TCAM blocks in each bundle. • Increase the number of TCAM bundles. • From 1 to 2 • From 4 to 6 (36, n, w, 4)  w = 1, 2, 4, 6 1 2 3 4

Experimental results (7/9) • Given the available TCAM resource such as • # TCAM bundles – 2 • # TCAM blocks in each bundle – 4 • It is important to know the expected contention ratio under different inputs. (m, n, 2, 4)  m = 6, 12, 18, 36 36 18 12 6

Experimental results (8/9) • Speedup gain of increasing the TCAM bundle for a given # inputs. (36, n, w, 4)  w = 1, 2, 4, 6 6 4 2 1

Experimental results (9/9) • The speedup changes with the number of inputs. (m, n, 2, 4)  m = 6, 12, 18, 36

High-performance TCAM-based IP Lookup Engines

High-performance TCAM-based IP Lookup Engines

Presentation Transcript

A New IP Lookup Cache for High Performance IP Routers

ip lookup

Performance Evaluation of Packet Classiﬁcation on FPGA-based TCAM Emulation Architectures

An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup

Transportation System: High Performance Engines

IP –Based SAN extensions and Performance

IP-Lookup and Packet Classification

PARALLEL-SEARCH TRIE-BASED SCHEME FOR FAST IP LOOKUP

Object Based High Performance Computing

PC-DUOS: Fast TCAM Lookup and Update for Packet Classifiers

MIPS Extension for a TCAM Based Parallel Architecture for Fast IP Lookup

FRUGAL IP LOOKUP BASED ON A PARALLEL SEARCH

What Is My IP Lookup Tool

Best IP Address Lookup Tool

What is My IP Address Lookup

High Performance Pt6 Engines For Sale

Parallel IP Lookup using Multiple SRAM-based Pipelines

Fast IP Address Lookup Algorithms

IP Address Lookup

IP Routing table compaction and sampling schemes to enhance TCAM cache performance

Parallel-Search Trie-based Scheme for Fast IP Lookup