1 / 11

IP Routing Processing with Graphic Processors

IP Routing Processing with Graphic Processors. Author : Shuai Mu , Xinya Zhang , Nairen Zhang , Jiaxin Lu , Yangdong Steve Deng, Shu Zhang Publisher : IEEE Conference On DATE, pp.93-98, 2010  Presenter : Ye- Zhi Chen Date: 2011/8/24. Introduction.

anja
Download Presentation

IP Routing Processing with Graphic Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IP Routing Processing with Graphic Processors Author: ShuaiMu , XinyaZhang , NairenZhang , JiaxinLu , YangdongSteve Deng, ShuZhang Publisher: IEEE Conference On DATE, pp.93-98, 2010  Presenter: Ye-Zhi Chen Date: 2011/8/24

  2. Introduction Internet traffic will grow at an accelerated rate , routers , therefore , have to deliver increasing processing capacity accordingly. • Throughputand Programmability • Hardware - Higher Throughput , but Lower Programmability • Software – Higher Programmability , but Lower Throughput • Netowrk Processors(NPs) – The lack of mature programming models and software development tools and incompatibility of architectures • GPU – With high-performance computing and its programming is accessible with CUDA.

  3. CUDA • CUDA Program – composed of codes running on both CPU and GPU. • Kernel - The function called by CPU but executed on GPU • Block – threads inside a block could exchange data through the shared memory and synchronize with one another.

  4. Network Intrusion Detection • Signature matching - checks if network payloads contain pre-supplied signatures to at line rates. 60% Two Algorithms Bloom filter – • hash table • space-efficient data structure • Errors – hash conflicts Aho-Corisick (AC) – • DFA

  5. Implementation Input

  6. Implementation Transfer packet : • Individually : simplest way but time-consuming • Batch :batch many small transfers into a larger one. • Paged-locked memory :be mapped into the address space of the device • Store Bloom vector and transition table in GPU’s texture memory • Divide each packet into smaller chunks , and every two neighboring chunks have an overlapped content with a length equal to the largest match texts.

  7. Rounting Table Lookup • Longest prefix match(LPM) • Radix tree • Portable Routing Table • trie

  8. Result • CPU : 0.6 Gbit/s • GPU : • Kernel only : 19 Gbit/s • Transfer : 3.4 Gbit/s • Paged-locked : 17 Gbit/s

  9. Result • CPU : 0.6 Gbit/s • GPU : • Kernel only : 3.6 Gbit/s • Transfer : 2.3 Gbit/s • Paged-locked : 3.2 Gbit/s

  10. Result The GPU performance of DFA improves rapidly and approaches a peak throughput of 9.2Gbit/s, which is more than 15 times faster than CPU

  11. Result

More Related