1 / 44

Programming Model for Network Processing on FPGAs

Programming Model for Network Processing on FPGAs. Eric Keller October 8, 2004 M.S. Thesis Defense. Abstract. Programming model for implementing network processing applications on an FPGA Present an API to higher level tools

chiquita
Download Presentation

Programming Model for Network Processing on FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming Model for Network Processing on FPGAs Eric Keller October 8, 2004 M.S. Thesis Defense

  2. Abstract • Programming model for implementing network processing applications on an FPGA • Present an API to higher level tools • Programming Language: Presents an abstraction in terms of resources more suitable to the networking domain • Compiler: Generate hardware from this description • Demonstrate through four applications • Aurora to GigE Bridge, RPC, IP Router, NAT

  3. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  4. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  5. Tools for FPGAs • Hardware Description Languages • Verilog, VHDL • Structural High-Level Languages • JHDL, JBits • Behavioral High-Level Languages • Handel-C, Forge • Domain Specific Languages • Cliff, Snort, Ponder

  6. Cliff Input • Maps Click to Xilinx FPGAs • Click is a domain specific language for Networking • Modular router on Linux • Elements of common operations • e.g. Decrement TTL • Elements written in Verilog • Script to put system together Lookup Simple op Queue Output

  7. Networking on FPGAs • Routing and Switching • MIR, IP Lookup, Crossbar Switch • Protocol Boosters • Error coding, encryption, compression • Security • Virus Scanning, Firewall • Web Server • TCP/IP in Hardware • 50-300x speedup over Sun/Intel based workstations

  8. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  9. Motivation • Goal: Create a design environment that allows networking experts to use FPGAs • Several point solutions have shown FPGAs to be a good solution • Domain specific languages • There is not a standard high-level tool • Use MIR as a starting framework • Collaborating threads processing a message • Flexible architecture for memory and communication

  10. Design API • Present an API to higher level tools • No leading high-level design entry for networking domain • Presents an abstraction in terms of resources suitable to the networking domain • e.g. threads • Allow specification of architecture as well as functionality • Generate hardware from this description • Generate VHDL • rely on existing back-end tools for mapping to FPGA • Present an intermediate textual format • XML

  11. Design Hierarchy . . . High Level Tools Teja Click Novalit Programming Interface soft architecture - mapping Back-end tools Platform FPGAs

  12. Design Flow • Main Focus: XML to VHDL to bit XML Description (programming language) API (Compiler) Hardware description Back-end tools Configuration Bitstream

  13. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  14. Abstraction Primitives Intellectual Property Interface to External System Thread communication synchronization Thread Memory

  15. Threads • Micro-engines with instruction level parallelism • Instruction set and conditionals used to program • User defined variables • Implemented as custom hardware • Not a microprocessor with fetch, decode, execute • Synchronization • Activate, Deactivate • Communication • lightweight, channels

  16. Intellectual Property • Allow for users to make use of pre-designed intellectual property (also called cores) • Not all algorithms are best expressed as a finite state machine • e.g. encryption, compression • User must: • define the interface • instantiate using an “include” type statement • associate with a thread

  17. Interfaces • Perimeter of the defined system • System can be whole FPGA or part of larger design • Exists as pre-defined netlist • Gigabit Ethernet, Aurora • Interface includes: • Grouping of signals into ports • Extra functionality • e.g. perform framing and error detection • Protocol to get the message • Threads interact with the interface • Instantiate involves an “include” type statement

  18. Memory • Provide buffering of messages, tables for lookup, storage of state • Parameterizable • Selection of different memories • exists as pre-defined netlist (…for now) • each possibly being parameterizable • Instantiate through “include” type statement • Associate a memory port with a thread

  19. Memory (cont’d) • FIFO • PutGet • Queue of objects, commit mechanism • SharedMemory • Single memory shared by multiple accessors • locking mechanism via BRAMs “READ_FIRST” • DPMem • Multiple memories shared by multiple accessors • Allocation mechanism

  20. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  21. Hardware Generation • Process of mapping between system resources to the hardware • Generate VHDL • One module per thread • Top level module hooking all components together • Memories, interfaces, channels exist as predefined netlists • Rely on back-end tools to create bitstream

  22. Top Level entity SYSTEM is port ( -- interface ) end SYSTEM; architecture struct of SYSTEM is -- signals begin -- synchronization logic -- instantiate each component -- (interfaces, memories, threads, externally defined IP, channels) end struct;

  23. Clock Domain 1 Clock Domain 2 B F Port A Port B I/F X I/F Y A D E H C memory G Clocks • Interfaces determine clock domains

  24. Thread entity THREAD is port ( -- interface ) end THREAD; architecture behavioral of THREAD is -- signals begin -- control logic -- combinatorial process -- synchronous process -- special circuitry for memory reads and channel gets end behavioral;

  25. Special Case Circuitry • Memory • READ(var, address) • User wants to work with var, not the memory signals • Need extra circuitry to enable this • Channels • CHAN_GET(var, address) • Extra conditional testing to see when address matches • START(thread, offset) • Extra circuitry to align the data • e.g. Ethernet header is 14 bytes

  26. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  27. Click • Click is a language for creating modular software routers • CLIFF is a tool that will map to FPGAs • Using XML instead • Create a base system • each element is a thread • each thread connects to one port of a DPMem • each thread can have state storage through SharedMemory memory element • Series of optimizations • some pre-base system, some post-base system

  28. Click (cont’d) Sub-graph match and replace Split Paths Click graph Move elements .clk .clk .clk Create base System Merge Elements Run Elements in parallel system.xml Lib. Of elements (XML)

  29. Teja • Teja is a development environment for NPUs • SW Lib - define constructs • Events, Data Structures, Components (state machine) • SW Arch - instantiate constructs • HW Arch - define the hardware resources • import for fixed defined (like NPUs) • create new one for FPGA target • HW Mapping • map constructs from SW arch to resources in HW Arch

  30. Teja (cont’d) Data Struct. Library (XML) State Machine GUI (C code) Thread Library (XML) compile Software Arch file (internal format) Software Arch. GUI (next slide)

  31. Teja (cont’d) (prev slide) Thread, DPMem, Aurora, etc. Hardware Arch.GUI Hardware Arch file (internal format) System.xml Hardware Mapping GUI Insert lib code Map

  32. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  33. Gigabit Ethernet to Aurora Bridge • Two flows that will convert a frame from one protocol to the other • Ethernet • broadcast protocol (needs addressing) • Coarse grain flow control • Aurora • Xilinx proprietary protocol for point to point communication over multi-gigabit transceivers • Fine grain flow control

  34. Bridge Architecture Aurora Aurora RX thread GMAC Put16Get8 Memory GMAC TX thread RX TX TX RX Put8Get16 Memory Aurora TX thread GMAC RX thread

  35. Bridge Test Setup

  36. Bridge Results • Compared result to VHDL code from XAPP777 • latency = time from last bit received to first bit sent

  37. Remote Procedure Call • Mechanism to invoke a procedure on a remote computer • used in NFS • Almost exclusive to workstations • Message with the parameters to the function as well as information about the function being called • Implement an RPC server with the functions add(x,y) and mult(x,y)

  38. RPC Architecture GMAC RX TX RX thread TX thread Put/Get Memories ADD MULT broadcast thread ETH thread IP thread UDP thread RPC thread

  39. RPC Test Setup Workstation to Workstation Workstation to FPGA

  40. FPGA vs Workstation • Perform several RPC calls to each from client workstation • Each server system connected directly to the client through an optical gigabit Ethernet cable

  41. From Device Drop Brodcasts DecIPTTL To Device CheckIP Header Lookup From Device Drop Brodcasts DecIPTTL To Device Click Based Applications IP Router - 2 Port (shown) - 16 Port (not shown) NAT queue To Device From Device Drop IPaddr rewriter IPFilter To Device queue From Device

  42. Click Results

  43. Outline of Talk • Background • Design Flow • User Interface • Compilation to Hardware • High Level Tools • Experiments/Results • Conclusions

  44. Conclusions • Presented a programming model for mapping networking applications to FPGAs • An API of abstractions (user interface) • Generate VHDL from the description (compiler) • Summary • Domain specific languages as a target design entry • FPGAs as a target for implementation • Platform based on threads and flexible memory architecture • MIR as a starting framework • Demonstrate efficient mappings/designs through four application examples

More Related