1 / 47

Hardware/Software Codesign of Embedded Systems

Hardware/Software Codesign of Embedded Systems. Reconfigurable Computing. Voicu Groza SITE Hall, Room 5017 562 5800 ext. 2159 Groza@SITE.uOttawa.ca. Outline. Introduction Enabling Technologies Fix, configurable, reconfigurable ... Reconfigurable Architectures

daktari
Download Presentation

Hardware/Software Codesign of Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware/Software Codesign of Embedded Systems Reconfigurable Computing Voicu Groza SITE Hall, Room 5017 562 5800 ext. 2159 Groza@SITE.uOttawa.ca

  2. Outline • Introduction • Enabling Technologies • Fix, configurable, reconfigurable ... • Reconfigurable Architectures • Run-Time-Reconfigurable System-on-Chip • Conclusion and Future Work • References

  3. 1. Introduction • Reconfigurable computing – Definition • Why reconfigurable computing ?

  4. Reconfigurable Computing - Definition • Reconfigurable Computing (RC) = presence of hardware (HW) that can be reconfigured (reconfigware - RW) • 1960: Gerald Estrin, “The UCLA Fixed-Plus-Variable (F+V) Structure Computer” • DeHon and Wawrzynek: “computing via a postfabrication and spatially programmed connection of processing elements.” • The architecture used in the computation is determined postfabrication and can therefore adapt to the characteristics of the executed algorithms. • The computation is spatial, in contrast to the more temporal style associated with microprocessors.

  5. Re-inventing the wheel... wire your own computer

  6. Why reconfigurable computing ? • Is your belt long enough? • Embedded hand-held devices need to reduce • the power consumption targets, • the acceptable packaging and manufacturing costs, • the time-to-market • High-performance computing • Today’s computationally intensive applications require more processing power: • streaming video, • image recognition and processing, • highly interactive services • telecommunications • genes • Cray revived its latest entry-level XD1 supercomputer by combining AMD Opteron processors with FPGAs for compute acceleration in a Linux environment.

  7. Why reconfigurable computing … cont.

  8. 2. Enabling Technologies • Programmable ICs: CPLD and FPGA (Xilinx 1984) • HW Abstractions • Fine-grained Reconfiguration is at the gate and register level. • By reconfiguration of registers, gates, and their interconnections, the internal structure of functional units is changed. • 2 major technologies: • Complex Programmable Logic Devices (CPLD) – EEPROM based • Field-Programmable Gate Arrays (FPGA) – SRAM based • Coarse-grained Reconfiguration is based on a set of fixed blocks, like functional units, processor cores, and memory tiles. • The reconfiguration is merely the reprogramming of the interconnections between the fixed blocks.

  9. Complex Programmable Logic Devices (CPLD) • Supplied with no predetermined logic function. • Programmed by user to implement any digital logic function. • Requires specialized computer software for design and programming. • Complex PLD (CPLD) = A PLD that has several programmable sections with internal interconnections between the sections. • The basic building block of a CPLD is a macrocell which implements a logic function that is synthesized into a sum of product equations, followed by a D-type register. • Macrocells are grouped into logic blocks which are connected via a centralized interconnect array.

  10. Altera MAX 7000 macrocell

  11. Field-Programmable Gate Array (FPGA) • Reconfigurable functional units • coarse grained - ALUs and storage • fine-grained - small lookup tables Interconnection network Universal gates and/or storage elements Switches

  12. Basic ingredient: Look Up Table (LUT) Universal gate = = Look-up table = memory Logic Cell 0 0 0 1 a0 data a1 a0 a1 & a2 • Memory elements: SRAM a1

  13. Configurable Logic Blocks (CLB - Xilinx)Logic Array Block (LAB – Altera) XILINX Spartan II CLB • 2 logic cells =1 slice (Xilinx) or • = 1 Adaptive Logic Module (ALM - Altera) • 2 slices = HW abstractions Configurable Logic Blocks (CLB - Xilinx)

  14. Xilinx - Spartan II Architecture • IOBs provide the interface between the package pins and the internal logic • CLBs provide the functional elements for constructing most logic • Dedicated block RAM memories (4096 bits each) • Clock DLLs for clockdistribution delay compensation and clock domain control • Versatile multi-level interconnect structure

  15. SRAM Buffer Xilinx Virtex FPGA Model Logic block CLB IO Mux Switch Matrix Switch Matrix Line Segments Programmable Interconnect Point (PIP)

  16. Virtex-II Architecture Overview • 1 CLB = 8 slices • 1 slice contains 2 function generators F & G which are configurable as • 4-input look-up tables (LUTs), or • 16-bit shift registers, or • 16-bit distributed SelectRAM memory. DCM = Digital Clock Manager Block SelRAM =18 Kbit (2k x 9bit of dual-port RAM) Multiplier blocks 18-bit x 18-bit

  17. 3. Fix, configurable, reconfigurable ... • A simple classification: • Non-configurable computing • Configurable computing • Reconfigurable computing • Each has its own characteristics, (dis)advantages and applications

  18. Execute 3.1. Non-Configurable Computing • Uses fixed hardware such as ASICs or Custom VLSI circuits (eg. Microprocessors like x86, Sparc, DEC, PowerPC, etc…) • Long product turnaround time, usually around 3-6 months • Optimized for performance • Can be quite costly • Hardwired thus no room for error, re-work, improvement

  19. Execute 3.2. Configurable Computing Bitstream Configuring Host • Configuring host supervises FPGA reconfiguration of a new bitstream • A bitstream is a sequence of bits which represents the burn-in configuration of the Hardware Block (HB) eg. synthesized, place and routed design 1110010001111111111111111110011000111100011111111101101001011101101110001001100011100000000011010101011110101011010111111111111 011010010111011011100010011000111001110010100110001110011100101001100011100111001010011000111001110010

  20. 3.2. Configurable Computing (Cont’d) Advantages: • Uses configurable hardware such as FPGA or CPLD • PLDs are soft wired for re-use of static hardware resources • Cost effective • Quick turnaround time • Flexible and ease in design process Disadvantages: • Inefficient use of hardware resources, cannot use idle FPGA area during run-time • Slow reconfiguration time, because of reconfiguring the entire FPGA for a single Hardware Block (HB) • Thus, must stop execution while reconfiguring a new Hardware Block

  21. Execute 3.3. Reconfigurable Computing Bitstream Configuring Host 011010010111011011100010011000111001110010100110001110011100101001100011100111001010011000111001110010110010 1110010001111111111111111110011000111100011111111101101001011101101110001001100011100 1110010001111111111111111110011000111100011111111101101001011101101110001001100011100 We could also use a placement algorithm to possibly fit all requested HBs into the FPGA

  22. 3. Reconfigurable Computing (Cont’d) Advantages: • Same as Configurable Computing • No need to completely stop the execution while reconfiguring the FPGA with a new HB • Efficient use of static hardware resources; can swap out or move HBs around to fit new HBs on the FPGA, no need for a larger FPGA or a second one • Fast reconfiguration times • Run-time reconfiguration on the fly • Less power consumption, as we can swap out HBs Disadvantages: • Routing HBs can be a heavy overhead for the configuring host especially if HBs are too large or when defragmentation is necessary

  23. What is Run-Time Reconfiguration (RTR) ? • On-the-fly flexibility • Combines characteristics of co-processors with those of reconfigurable computing • Introduces overhead to reconfigure the co-processor but offsets by increasing execution speed (faster in H/W!)

  24. 4. Reconfigurable Architectures • External stand-alone processing unit • Attached processing unit • Reconfigurable functional unit • Co-processor • Processor embedded in a reconfigurable fabric (Compton & Hauck)

  25. External stand-alone processing unit RPU coupled to the I/O system bus • The RECON System • John Reid Hauser • John Wawrzynek • Randy H. Katz • (University of California, Berkeley) • Consists of a SUN SparcStation host and a reconfigurable coprocessor board (The board exploits a XC4010 FPGA as the reconfigurable processor unit).

  26. Attached processing unit RPU coupled to the local bus • TKDM • Marco Platzner • ETH Zurich • FPGA module that uses the DIMM (dual inline memory module) bus for high-bandwidth communication with the host CPU. • It is integrated with the Linux host OS; • offers functions for data communication and FPGA reconfiguration.

  27. Attached processing unit(Cont.) • Consists of a combination of a RISC processor core with an array of coarse-grain reconfigurable cells; • It utilizes a DMA controller in order to load the configuration data (context) into the Context Memory Morphosys Nader Bagherzadeh University of California, Irvine • Coarse grain: MorphoSys operates on 8 / 16-bit data. • Configuration: RC array is configured by context words, which specify an instruction opcode for RC. • Depth of programmability: The Context Memory can store up to 32 planes of configuration. • Dynamic reconfiguration: Contexts are loaded into Context Memory without interrupting RC operation. • Local/Host Processor: The control processor (Tiny RISC) and RC Array are resident on the same chip. • Fast Memory Interface: Through DMA controller.

  28. Reconfigurable functional unit RPU integrated in the CPU • Chimaera • S. Hauck • University Washington, Seatle • System treats the reconfigurable logic as a cache for RPU instructions. • Those instructions that have recently been executed, or that we can otherwise predict might be needed soon, are kept in the reconfigurable logic. • If another instruction is required, it is brought into the RPU by overwriting one or more of the currently loaded instructions. Chimaera

  29. Co-processor RPU coupled to the CPU • GARP • Hauser & Wawrzynek • University of California, Berkley • A reconfigurable architecture that combines reconfigurable hardware with a standard MIPS processor on the same die to retain better feature performance. • Two configurations can never be active at the same time on its reconfigurable array which can significantly reduce the overall performance of the system.

  30. 5. RTR-SoC System Architecture Execution unit of HBs Allows dedicated OMA-RPU access Stores program and data code IBM OPB Runs software instructions Stores HB bitstreams RTR-SoC System Architecture

  31. Application and Reconfiguration Flows • While the application flow runs on AE, RE sends RTR_PREP_HB to the ICAP controller, to start the loading of the first HB bitstream onto the RPU. • Once this HB is ready in the RPU, the ICAP sends back an RTR_ACK to the RE. • The newly implemented HB on the RPU starts to work as soon as it is ENABLEd by the reconfiguration flow on RE. • Upon completion, HB sets flag RTR_DONE to make the application flow aware that it is ready for use. • Once the application flow on AE has prepared data that HB needs, AE asserts the flag DATA_READY. • HB asserts EXE_DONE when finishes its task and has prepared the results to be read by the application flow on AE. • When the application flow needs these results, it checks the flag EXE_DONE, and waits if it is not yet set. • The application flow gets the results and then asserts DATA_ACK to acknowledge to HB that it got data.

  32. Final system architecture RE AE

  33. Tasks running on AE and RE

  34. Physical Layer Overview • Have already developed a physical layer in JBits in order to evaluate RTR on a Xilinx Virtex device • Physical layer has 3 main functions • modeling the FPGA resources, • running a placement algorithm for the different Hardware Blocks, and • managing the physical resources of the FPGA and any on-board peripherals. RTR Execution Model • Bitstream(s) read by the JBits App • JBits App configures the Virtex RC HW located in the PCI slot using the XHWIF API. • XHWIF (Xilinx HardWare InterFace Standard)  Java interface for communicating with FPGA- based boards. This Enables run-time reconfiguration of Virtex Device. JBits is a set of Java APIs and classes that provide a High-Level language approach to develop reconfigurable Systems, include RT reconfiguration.

  35. HBDU … … . . . . . . . . . … valid CU done r/w Mem req Packer Dispatcher Mem ack HBIU I-Buffer Data_ MAB O-Buffer HB sel1 Register Decoder RS10 . . . HB sel2 RS20 Register Decoder . . . reg sel1 RS1n reg sel2 RS2n data_ opb data HB I/F addr HB ss opb addr MAB r/w opb ss mc r/w hb PE PE PE PE PE PE PE PE PE LM LM LM LM LM LM LM LM LM Hardware Block (HB) Architecture • An HB is a functional hardware module that contains its own configuration (i.e. the bitstream), and state information (e.g. status and control registers) that define its current state. • It is divided into two major components: • The HB Dependent Unit (HBDU) Encompasses several components that vary in functionality and magnitude depending on the functions supported by a particular HB. • The HB Independent Unit (HBIU) Designed as a core and hence follows a standardized implementation scheme for all HBs.

  36. ICAP FPGA Configuration Memory Control Logic MicroBlaze BRAM OPB Bus Hardware Block Reconfiguration • The HBs are partially reconfigured by the aforementioned Reconfigurable Processing Unit (RPU). • The reconfiguration process is enabled by means of a Self-Reconfiguration Platform (SRP). • It enables the FPGA to be dynamically reconfigured under the control of an embedded microprocessor. • It is divided into a H/W component and S/W components. • The H/W component consists of four primary components: the Internal Configuration Access Port (ICAP), some control logic, a small configuration cache - Block RAM (BRAM), and an embedded processor. • The S/W component implements an APIthat defines methods for accessing configuration logic through the ICAP port.

  37. PR Methodology: Xilinx Virtex II Architecture • Virtex II FPGAs fabric composed of an array of Configurable Logic Blocks (CLBs). • Block RAMs (BRAM). • Input/Output Blocks (IOBs). • Special functions blocks such as Multipliers, PLLs etc. • Each CLB contains four slices. • Each slice contains two 4-input look-up tables, 2 D-type flip-flops to implement combinational and sequential circuits.

  38. PR Methodology • Bus Macros (BMs) are required between active and static modules of the design. • The size and location of the reconfigurable module (active) is always fixed. • The reconfigurable module is always the full height of the device; • All logic resources located within the width of the module are considered part of the reconfigurable module’s bitstream frame. This includes slices, tri-state buffers (TBUFs), block RAMs (BRAMs), multipliers, input/output blocks (IOBs), and all routing resources.

  39. PR Methodology Bus Macro block Diagram • Bus Macros (BMs) are predefined physical routing bridges that connect the active to the static one. • Any connection from active to static logic should always go through a bus macro • We chose the slices bus macros (over the TBUF) as they give higher concentration of communication bits per CLB • Bus macros allows data to move in only one direction either left-to-right or right-to-left.

  40. PR Methodology Final Design Layout Design contains only one active module. All other logic components are on the static module.

  41. PR Methodology Xilinx Internal Configuration Access Port (ICAP) • Provides configuration interface to FPGA fabric. • Cache BRAM to hold at least one frame. • Control logic for the OPB bus interface. • API calls to allow SW to read/Write configuration memory.

  42. PR Methodology • A partial bitstream is generated for the active (dynamic) part of the FPGA • The device remains in full operation while the new partial bitstream is downloaded • The full bitstream configuration must already be programmed into the device before downloading the partial bitstream. • Multiple bitstreams can be generated for every partially reconfigurable module variation • Failing to utilize this command will assert the global set reset (GSR) during configuration, resetting the entire design • –g ActiveReconfig: Yes option

  43. PR Methodology • Virtex-II configuration memory is arranged in vertical frames that are one bit wide and stretch from the top edge of the device to the bottom. • These frames are the smallest addressable segments of the Virtex-II configuration memory space; therefore, all operations must act on whole configuration frames. • The length of a Virtex-II frame is not fixed and depends on the size of the device. • the number of frames per column type is constant for all devices.

  44. Reconfigurable Processing Unit The RPU high-level block diagram

  45. Preliminary Results • Xilinx Virtex-II Platform FPGAs were used to implement this system. • Preliminary results were generated using ModelSim SE 5.7f. Simulation results for the HB I/F interface. They illustrate how the I/F is used in order to enable proper synchronization among the reconfiguration flow and the application flow.

  46. 6. Conclusion and Future Work • A novel architecture of a RTR SoC is introduced • RPU and HBs are designed • This design targets adaptive embedded systems, DSP-related and low-power applications • These functions are implemented as HBs and can be exploited in a multi-purpose environment. For example, the RTR SoC may execute various tasks to perform DSP-related functions, and subsequently reconfigured into a high-performance measurement processing system • Future designs would allow the user more flexibility by auto-reconfiguring the RPU depending on the computational and functional needs of its respective applications • Real-time applications is our future target, as idle HBs are swapped out of the RPU, to save power or to allow for updates to the HBs

  47. References • Marco Platzner. „Reconfigurable Computer Architectures,“ e&i Elektrotechnik und Informationstechnik, 115(3):143-148, 1998. Springer. • Y. Li, T. Callahan, E. Darnel, R. Harr, U. Kurkure and J. Stockwood, “HardwareSoftware Co-Design of Embedded Reconfigurable Architectures,” 37th Design Automation Conference, 2000. Proceedings DAC pp.:507 - 512, June 5-9, 2000. • J. P. Heron, R. Woods, S. Sezer, and R. H. Turner. “Development of a run-time reconfiguration system with low reconfiguration overhead,” Journal of VLSI Signal Processing, 28(1/2):97-113, May 2001. • “Xilinx Microblaze Soft Processor Core,” http://www.xilinx.com/ise/embedded/edk6_2docs/mb ref_guide.pdf, last accessed on October 19, 2004 • G. Aggarwal, N. Thaper, K. Aggarwal, M. Balakrishnan, and S. Kumar. “A Novel Reconfigurable Co-Processor Architecture,” In Proceedings of Tenth International Conference on VLSI Design, pages 370-375, January 1997. • G. Haug and W. Rosenstiel. “Reconfigurable Hardware as Shared Resource in Multipurpose Computers,” In Reiner W. Hartenstein and Andres Keevallik, editors, Field-Programmable Logic: From FPGAs to Computing Paradigm, Springer-Verlag, pages 149-158, Berlin, August/September 1998. • “Xilinx Virtex-II Platform FPGAs: Complete Data Sheet,” DS031 (14 Oct. 2003). • D. Wo and K. Forward, “Compiling to the Gate Level for a Reconfigurable Co-Processor” In Proceeding of FPGAs for Custom Computing Machines (1994), pages 147-154. • V. Groza, R. Abielmona, M. El-Kadri, N. Sakr, and M. Elbadri, “A Reconfigurable Co-Processor for Adaptive Embedded Systems,” Workshop on Intelligent Solutions in Embedded Systems, Graz, Austria, June 2004. • “IBM On-Chip Peripheral Bus,” http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/ 9A7AFA74DAD200D087256AB30005F0C8/$file/OpbBus.pdf last accessed on October 19, 2004 • R. Abielmona, V. Groza, N. Sakr, and J. Ho, “Low-Level Run-Time Reconfiguration of FPGAs for Dynamic Environments,” IEEE Canadian Conference on Electrical and Computer Engineering, CCECE 2003, Niagara Falls, May 2004. • B. Blodget, P. James-Roxby, E. Keller, S. McMillian, and P. Sundararajan. “A Self reconfiguring Platform,” Proceedings of the International Conference on Field Programmable Logic, Lisbon, Portugal, Sept. 2003.

More Related