1 / 35

Towards a Billion Routing Lookups per Second in Software

Towards a Billion Routing Lookups per Second in Software. Author: Marko Zec , Luigi, Rizzo Miljenko Mikuc Publisher: SIGCOMM Computer Communication Review, 2012 Presenter: Yuen- Shuo Li Date: 2012/12/12. Outline. I dea Introduction S earch Data Structure L ookup Algorithm

lottie
Download Presentation

Towards a Billion Routing Lookups per Second in Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a Billion Routing Lookups per Second in Software • Author: Marko Zec, Luigi, Rizzo MiljenkoMikuc • Publisher: SIGCOMM Computer Communication Review, 2012 • Presenter: Yuen-Shuo Li • Date: 2012/12/12

  2. Outline • Idea • Introduction • Search Data Structure • Lookup Algorithm • Updating • Performance

  3. Idea Can a software routing implementation compete in a field generally reserved for specialized lookup hardware?

  4. introduction(1/3) Software routing lookups have fallen out of focus of the research community in the past decade, as the performance of general-purpose CPUs had not been on par with quickly increasing packet rates. Software vs Hardware?

  5. introduction(2/3) But landscape has changed, however. • The performance potential of general-purpose CPUs has increased significantly over the past decade. • instruction-level parallelism, shorter execution pipelines, etc. • The increasing interest in virtualized systems and software defined networks calls for a communication infrastructure that is more flexible and less tied to the hardware.

  6. introduction(3/3) DXR, an efficient IPv4 lookup scheme for software running on general CPUs. • The primary goal was not to implement a universal routing lookup schemes: data structures and algorithms which have been optimized exclusively for the IPv4 protocol. • distills a real-world BGP snapshot with 417,000 prefix into a structure consuming only 782 Kbytes, less than 2 bytes per prefix, and achieves 490 MLps MLps: million lookups per second

  7. Search Data Structure(1/11) Building of the search data structure begins by expanding all prefixes from the database into address ranges.

  8. Search Data Structure(2/11) Neighboring address ranges that resolve to the same next hop are then merged.

  9. Search Data Structure(3/11) • each entry only needs the start address and the next hop. the search data we need!

  10. Search Data Structure(4/11)

  11. Search Data Structure(5/11) k = 16, chunks Lookup table • To shorten the search by splitting the entire range in chunks, and using the initial k bits of the address to reach, through a lookup table.

  12. Search Data Structure(6/11) k = 16, chunks • Each entry in the lookup table must contain the position and size of the corresponding chunk in the range table • a special value for size indicates that the 19 bits are an offset into the next-hop table. Lookup table entry 32 bits 1 bit 12 bits 19 bits

  13. Search Data Structure(7/11)

  14. Search Data Structure(8/11) Range table • With k bits already consumed to index the lookup table, the range table entries only need to store the remaining 32-k address bits, and the next hop index.

  15. Search Data Structure(9/11) Range table • A further optimization which is especially effective for large BGP views, where the vast majority of prefixes are /24 or less specific. long format short format

  16. Search Data Structure(10/11) Range table • one bit in the lookup table entry is used to indicate whether a chunk is stored in long or short format. 32 bits long format 16 bit 16 bits 16 bits short format 8 bits 8 bit

  17. Search Data Structure(11/11)

  18. Lookup Algorithm(1/4) • The k leftmost bits of the key are used as an index in the lookup table. 1.2.4.0 00000001.00000010.00000100.00000000 index

  19. Lookup Algorithm(2/4) • If the entry points to the next hop, the lookup is complete. • A special value for size indicates that the 19 bits are an offset into the next-hop table.

  20. Lookup Algorithm(3/4) • Otherwise, the (position, size) information in the entry select the portion of the range table on which to perform a binary search of the (remaining bits of the ) key. position size

  21. Lookup Algorithm(4/4)

  22. Updating(1/3) Lookup structures store only the information necessary for resolving next hop lookups, so a separate database which stores detailed information on all the prefixes is required for rebuilding the lookup structures. lookup information detailed information

  23. Updating(2/3) Updates are handled as follows: • updates covering multiple chunks(prefix length < k) are expanded. • then each chunk is processed independently and rebuilt from scratch. • Finds the best matching route for the first IPv4 address belonging to the chunk, translating it to an range table entry. • If the range table heap contains only a single element when the process ends, the next hop can be directly encoded in the lookup table.

  24. Updating(3/3) The rebuild time can be reduced • by coalescing multiple updates into a single rebuild • We have implemented this by delaying the reconstruction for several milliseconds after the first update is received • processing chunks in parallel, if multiple cores are available

  25. Performance(1/11) Our primary test vectors were three IPv4 snapshots from routeviews.org BGP routers.

  26. Performance(2/11) Here we present the results obtained using the LINX snapshot, selected because it is the most challenging for our scheme: • it contains the highest number of next hops • and an unusually high number of prefixes more specific than /24

  27. Performance(3/11) Our primary test machine had a 3.6 GHz AMD FX-815 8-core CPU and 4 GB of RAM. Each pair of cores share a private 2 MB L2 cache block, for a total of 8 MB of L2 cache. All 8 cores share 8 MB of L3 cache. We also ran some benchmarks on a 2.8 GHz Intel Xeon W3530 4-core CPU with smaller but faster L2 caches(256 KB per core), and 8 MB shared L3 cache • All tests ran on the same platform/compiler(FreeBSD 8.3, amd 64, gcc 4.2.2) • Each test ran for 10 seconds

  28. Performance(4/11) We used three different request patterns: • REP (repeated): • each key is looked up 16 times before moving to the next one. • RND (random): • each key is looked up only once. The test loop has no data dependencies between iterations. • SEQ (sequential): • same as RND, but the timing loop has an artificial data dependency between iterations so that requests become effectively serialized.

  29. Performance(5/11)

  30. Performance(6/11) average lookup throughput with D18R is consistently faster compared to D16R on our test systems.

  31. Performance(7/11) the distribution of binary search steps for uniformly random queries observed using the LINX database. n = 0 means a direct match in the lookup table

  32. Performance(8/11) • the interference between multiple cores, causing a modest slowdown of the individual threads.

  33. Performance(9/11) DIR-24-8-BASIC

  34. Performance(10/11)

  35. Performance(11/11)

More Related