110 likes | 270 Views
IP Routing Processing with Graphic Processors. Author : Shuai Mu , Xinya Zhang , Nairen Zhang , Jiaxin Lu , Yangdong Steve Deng, Shu Zhang Publisher : IEEE Conference On DATE, pp.93-98, 2010 Presenter : Ye- Zhi Chen Date: 2011/8/24. Introduction.
E N D
IP Routing Processing with Graphic Processors Author: ShuaiMu , XinyaZhang , NairenZhang , JiaxinLu , YangdongSteve Deng, ShuZhang Publisher: IEEE Conference On DATE, pp.93-98, 2010 Presenter: Ye-Zhi Chen Date: 2011/8/24
Introduction Internet traffic will grow at an accelerated rate , routers , therefore , have to deliver increasing processing capacity accordingly. • Throughputand Programmability • Hardware - Higher Throughput , but Lower Programmability • Software – Higher Programmability , but Lower Throughput • Netowrk Processors(NPs) – The lack of mature programming models and software development tools and incompatibility of architectures • GPU – With high-performance computing and its programming is accessible with CUDA.
CUDA • CUDA Program – composed of codes running on both CPU and GPU. • Kernel - The function called by CPU but executed on GPU • Block – threads inside a block could exchange data through the shared memory and synchronize with one another.
Network Intrusion Detection • Signature matching - checks if network payloads contain pre-supplied signatures to at line rates. 60% Two Algorithms Bloom filter – • hash table • space-efficient data structure • Errors – hash conflicts Aho-Corisick (AC) – • DFA
Implementation Input
Implementation Transfer packet : • Individually : simplest way but time-consuming • Batch :batch many small transfers into a larger one. • Paged-locked memory :be mapped into the address space of the device • Store Bloom vector and transition table in GPU’s texture memory • Divide each packet into smaller chunks , and every two neighboring chunks have an overlapped content with a length equal to the largest match texts.
Rounting Table Lookup • Longest prefix match(LPM) • Radix tree • Portable Routing Table • trie
Result • CPU : 0.6 Gbit/s • GPU : • Kernel only : 19 Gbit/s • Transfer : 3.4 Gbit/s • Paged-locked : 17 Gbit/s
Result • CPU : 0.6 Gbit/s • GPU : • Kernel only : 3.6 Gbit/s • Transfer : 2.3 Gbit/s • Paged-locked : 3.2 Gbit/s
Result The GPU performance of DFA improves rapidly and approaches a peak throughput of 9.2Gbit/s, which is more than 15 times faster than CPU