220 likes | 355 Views
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET. Author: Parallel Table Lookup for Next Generation Internet Publisher/Conf .: Computer Software and Applications, 2008. COMPSAC '08. 32nd Annual IEEE International Speaker: Han-Jhen Guo Date: 2009.03.11. OUTLINE. Introduction
E N D
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET Author: Parallel Table Lookup for Next Generation Internet Publisher/Conf.: Computer Software and Applications, 2008. COMPSAC '08. 32nd Annual IEEE International Speaker: Han-Jhen Guo Date: 2009.03.11
OUTLINE • Introduction • The Proposed Scheme • Implement • Performance
INTRODUCTION- BINARY SEARCH AMONG PREFIX LENGTHS (1/2) • eg. (address length = 8)
INTRODUCTION- BINARY SEARCH AMONG PREFIX LENGTHS (2/2) Error! E (111*) should be matched. • eg. (search 11101000) match not match key = 1 key = 11 not match key = 1110
INTRODUCTION- BINARY SEARCH TREE WITH MARKERS (1/3) • Solution: marker • eg.markers of prefix 10010* = 1*, 10*, 100*, 1001* • Meaning: should have a matched prefix longer than this marker • Insert markers into those hash tables in the search path of binary search tree (only pick those markers whose lengths have appeared in the lookup order) • In order to avoid backtracking, the marker is recorded with BMP
INTRODUCTION- BINARY SEARCH TREE WITH MARKERS (3/3) • eg. (search 11101000) match key = 11 key = 111 notmatch match key = 1110
INTRODUCTION- CONCLUSION • The lookup scheme in above is scalable with complexity O(log2W), where W is the length of the IP address. • Assuming that we have a perfect hash function, we only need to do lookup for each hash table only one time It only need to perform lookup of 5 different hash tables in the worst case in IPv4
THE PROPOSED SCHEME- MERGING HASH TABLES • The concept of merging hash tables • (n = 1, 2, 3, 4, etc.) • Assuming either prefix P.0 or prefix P.1 is in Table2n+1(. means which is followed by a bit) , there should have a marker P in Table2n • Associate a marker P in Table2n with P.0 and P.1 • It only need to lookup instead of 4 different hash tables in the worst case after merging
THE PROPOSED SCHEME- DATA STRUCTURE OF MODIFIED HASH NODE (1/2) • Data structure
THE PROPOSED SCHEME- DATA STRUCTURE OF MODIFIED HASH NODE (2/2) • eg. after merging
THE PROPOSED SCHEME- LOOKUP ALGORITHM • eg. (search 11101000) key = 11 BMP = G → E key = 1110
THE PROPOSED SCHEME- MAKING LOOKUP ALGORITHM PIPELINED (1/2) • Binary search tree for IPv6 without merging hash tables • Modified binary search tree for IPv6
THE PROPOSED SCHEME- MAKING LOOKUP ALGORITHM PIPELINED (2/2) • assign each one to do lookup of the hash table in one level → 6 stages totally • Each stages: • use the destination IP address as the key to do hash • lookup the hash table using the hash value as the index • do computation according to the lookup result • BMP so far • the hash table to be searched next • the skip flag for the next processing unit regarding the BMP that has been found
THE PROPOSED SCHEME- USING MULTI-THREADING IN THE PIPELINE STAGE
IMPLEMENT- IMPLEMENTATION PLATFORM • IXP2400 • 8 micro engines; it supports 8 threads each micro engine • use 6 micro engines to implement our design of pipeline, and run 8 threads on each micro engine for realizing the design of multi-threading • IXA SDK 4.1 • to simulate the environment of IXP2400
IMPLEMENT- IMPLEMENTATION BRIEFS (1/4) • Maximum size of three separate memories • The average latencies of reading eight 4-byte words from SRAM and DRAM in the circumstance of only one micro engine trying to access the memories
IMPLEMENT- IMPLEMENTATION BRIEFS (2/4) • The average latencies of reading 8 words from a certain channel of SRAM or DRAM when different numbers of micro engines try to contend for accessing that channel of SRAM or DRAM • allow 8 simultaneous SRAM accesses (4 from each channel) and 3 simultaneous DRAM accesses without increasing the average memory latency
IMPLEMENT- IMPLEMENTATION BRIEFS (3/4) • Distribute hash tables to 3 separate memory of IXP2400
IMPLEMENT- IMPLEMENTATION BRIEFS (4/4) • Hashing • hash function: CRC32 • collision resolution: chaining • alleviate the penalty of hash collision with access 2 contiguous nodes a time with a little bit of memory latency Fig. Chaining in a hash Fig. Chaining with 2 contiguous nodes
PERFORMANCE • Comparisons of maximum forwarding rates • 10,000 random IP addresses and calculate the number of total cycle counts required to perform lookup
Thanks for your listening!