240 likes | 435 Views
High-performance TCAM-based IP Lookup Engines. Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date: 2008/9/24. Outline. Introduction Previous works MSMB scheme MSMB-PT scheme MSMB-LPT scheme Goals of this paper Proposed works
E N D
High-performance TCAM-based IP Lookup Engines Authors:Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present:林呈俞 Date: 2008/9/24
Outline • Introduction • Previous works • MSMB scheme • MSMB-PT scheme • MSMB-LPT scheme • Goals of this paper • Proposed works • M-MSMB-LPT scheme • MSMB-LPT-I scheme • Experimental results
Introduction (1/3) • To achieve high IP lookup performance, it has been proposed to use TCAMs to implement IP-Lookup accelerators. • One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice. • Previous works on reconfiguring a TCAM into several independent blocks. • MSMB • MSMB – PT • MSMB – LPT
Introduction (2/3) • MSMB (Multi – Selector and Multi – Block) scheme • Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible. • With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups. • One would need M parallel RDs for the this system.
Popular-Prefix Table (PT): caching some of the prefixes recently used by all inputs. Introduction (3/3) • MSMB – PT (Popular – prefix table) scheme • This scheme is based on temporal locality of packet destinations. • In order to alleviate the TCAM contention problem caused by traffic bias.
MSMB – LPT (Local PT) (1/2) • A flow is a stream of packets, for which the packets are transmitted as a bursty sequence. • For a given router R, the packets of flows arrive at same input of R exhibit bias of IP streams to a small set of IP prefixes. • For any bursty traffic period of an input of R, the bias of IP addresses is called the temporal locality of flows. • The major difference between MSMB – LPT and MSMB – PT are as follows • MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption). • LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions.
Local Popular-Prefix Table (LPT): it used to dynamically store recently referenced IP prefixes requested from input i. Contention Resolver (CR): chooses one request according to a priority scheme and passes it to TCAM. MSMB – LPT (Local PT) (2/2)
Goals of this paper • How to design a TCAM-based IP lookup engine that • improves MSMB-LPT without using more HW resources ? • satisfy given performance requirements ? • For lage m (inputs) • How to design a scalable TCAM-based IP lookup engine ? • How to find tradeoffs among cost, performance and reliability ?
Proposed work (1/5) • Definitions: • MSMB – LPT has a configuration with (m, n, k) • m input • k TCAM blocks • LPT of size n • Total number of prefixes M (each block contains M/k prefixes). • The parameters m and k are carefully selected to achieve optimized cost and performance. • Are there better MSMB schemes for given m and k ? • Two proposed schemes: • M – MSMB – LPT • MSMB – LPT – I
Proposed work (2/5) • Multiple(M) – MSMB – LPT • For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k). • input i*m’ + jas the j-th input of the (i+1)-th MSMB-LPT. m’ = m / w
Input(j-1)*m’ + 1 … Input(j-1)*m’ + 2 MSMB - LPTj k CRs and k TCAM blocks Inputj*m’ Proposed work (3/5) • Multiple(M) – MSMB – LPT • The w TCAM clocks TCAMj,u,have the same content as TCAMu in MSMB-LPT, where j = 1 ~ w. • We say that an M-MSMB-LPT has configuration (m, n, w, k). • if it has wMSMB-LPTs of configuration (m’, n, k). • In an M-MSMB-LPT scheme, w MSMB-LPTs operate completely independently.
Proposed work (4/5) • MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I) • An MSMB – LPT – I of configuration (m, n, w, k)has • m input, and the LPT of size n. • wk TCAM blocks that are partitioned into k groups, each called TCAM bundle. k bundles Input 1 Input 2 The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAMjin the MSMB-LPT scheme. Input m
Proposed work (5/5) Process runs concurrently j = 1~ k i = 1~ m ni – th key from input i • The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.
Experimental results (1/9) • We conduct a serious simulations on M-MSMB-LPTand MSMB-LPT-I. • First – in – first – out (FIFO) replacement policy is used for LPT update. • Round – rodin (RR) arbitration is used for TCAM contention resolution. • Two packet traces are used in simulations. • 1. generating accroding to routing table described in [17]. • 2. derived from actual packet flows given in [19]. • The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT. • The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k)as follows. • (m, n, 1, k) = MSMB-LPT with (m, n, k). • (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k). • Example: • MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M-MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4) # bundles # blocks
Experimental results (2/9) • Performance metrics • TCAM contention ratio • Speedup over naïve MSMB • TCAM utilization # contentions at TCAM blocks Total # key search time. Total # parallel cycles to complete IP lookup for all packets in a trace. AMSMB-LPT-I(j) : total # cycles in which TCAMj blocks is searched.
Experimental results (3/9) • Power consumption
Experimental results (4/9) • Speedup 48 TCAM blocks 16 TCAM blocks
Experimental results (5/9) • Power consumption
Experimental results (6/9) • Contention ratio • 36 inputs and 4 TCAM blocks in each bundle. • Increase the number of TCAM bundles. • From 1 to 2 • From 4 to 6 (36, n, w, 4) w = 1, 2, 4, 6 1 2 3 4
Experimental results (7/9) • Given the available TCAM resource such as • # TCAM bundles – 2 • # TCAM blocks in each bundle – 4 • It is important to know the expected contention ratio under different inputs. (m, n, 2, 4) m = 6, 12, 18, 36 36 18 12 6
Experimental results (8/9) • Speedup gain of increasing the TCAM bundle for a given # inputs. (36, n, w, 4) w = 1, 2, 4, 6 6 4 2 1
Experimental results (9/9) • The speedup changes with the number of inputs. (m, n, 2, 4) m = 6, 12, 18, 36