200 likes | 309 Views
Author : Masanori Bando, N. Sertac Artan, and H. Jonathan Chao Publisher/Conf : 15th IEEE Workshop on High Performance Switching and Routing (HPSR), 2009 Speaker : Chen Deyu Data : 2009.9.23. Flash look : 100-Gbps hash-tuned route lookup architecture. Outline.
E N D
Author : Masanori Bando, N. Sertac Artan, and H. Jonathan Chao Publisher/Conf : 15th IEEE Workshop on High Performance Switching and Routing (HPSR), 2009 Speaker : Chen Deyu Data : 2009.9.23 Flash look : 100-Gbps hash-tuned route lookup architecture
Outline • Introduction • Flash Look Architecture A. FlashLook Hash Table B. Verify Bit Aggregation C. HashTune D. RAM-based Black Sheep Memory (BSM) • Implementation of a 100-Gbps IP Lookup • Performance evaluation
Introduction In this paper, we propose FlashLook, a low-cost,high- speed route lookup architecture scalable to large routing tables. FlashLook allows the use of low cost DRAMs, while achieving high throughput. Hash-based IP lookup schemes are promising, yet the following three shortcomings of such schemes prevent them from achieving the requirements of future routers.
Shortcomings of Hash-based schemes • To use a hash function to look up routes, either all the routes need to be expanded to a fixed length using a technique called prefix expansion. • It is hard to find a perfect hash function that can distribute routes evenly to bins in a hash table. • A small on-chip CAM is used traditionally to accommodate these black sheep. However on-chip CAMs cannot be too large.
Flash Look schemes Flash Look is also a hash-based method; however, it eliminates the three shortcomings of previous hash based methods by introducing. • A data compaction method called verify bit aggregation that counter balance the memory increase caused by prefix expansion. • A novel hash method called HashTuneto evenly distribute elements in a hash table. • A RAM-based black sheep memory (BSM) to replace the expensive on-chip CAMs.
Observations Prefix length distribution is an important parameter for prefix expansion, in this paper, we will only use the Oregon routing table, which is the largest among the four, as a representative routing table.
Flash Look Architecture Hash Table Assume a bin has a capacity of c. If there are at most c elements hashed to this bin, the bin stores these elements back-to-back. If, on the other hand, there are c + v elements hashed to this bin, the first c − 1 elements hashed to this bin are stored in the bin and the remaining part of the bin will be used as a pointer to a BSM location that stores the remaining v + 1 elements.
In this paper, assume c=3 and v=7 for IPv4/24 and c=4 and v=7 for IPv4/32 as these values provide a good memory utilization for IP lookup using current DRAM technology
Flash Look Architecture Verify Bit Aggregation • Prefix expansion for Flash Look In this paper, it’s expand prefixes into 3 lengths. shorter than 18 bits to 18 bits, prefixes with lengths between 19 and 23 bits to 24 bits, and prefixes with lengths between 25 and 31 bits to 32 bits. We denote the set of all prefixes with lengths 18, 24, and 32 bits after prefix expansion as IPv4/18, IPv4/24, and IPv4/32,respectively.
Flash Look Architecture RAM-based Black Sheep Memory (BSM) On-chip CAMs are especially useful since they provide constant time access to the prefixes by their content- addressable nature. Unfortunately, since, onchip CAMs require a significant amount of resources, the on-chip CAM based BSM should be very small. In this paper, we propose an addressing scheme that utilizes the DRAM, The proposed addressing scheme stores a pointer in a hash bin in the external DRAM when the bin overflows.
Implementation of a 100-Gbps IP Lookup For a 100-Gbps link, 250 million lookups per second are required in the worst-case. In other words, each IP route lookup operation must be finished within 4 ns. To achieve high-speed operations, multiple copies of each Next Hop (NH) table are stored in the DRAM chips.
If the DRAM clock frequency is 200 MHz, which is equivalent to 5 ns clock period, access to all four blocks can be completed in 60ns.
We need 60/4 = 15 copies of next hop information for each route length. Therefore, we need three basic 3 DRAM chip configurations (9 DRAMs) to achieve a 100-Gbps line Rate for IPv4 route lookup
Performance • HashTune and BSM Performance
Index Bits where Hi is the entropy of the bit at position i (1 ≤ i ≤ 32) and bit position 1 corresponds to the most-significant bit. pi is the percentage of 1s at bit position i.
Verify Bit Aggregation The simulation result shows that the numbers of aggregated elements reduced by almost half at each increase of aggregation level. This shows that the verify bit aggregation operation significantly reduced the number of elements.