1 / 29

eMIPS Workstation Overview for FPGA Acceleration

Explore how eMIPS processors extend functionalities dynamically on FPGA, enabling speedups in applications and real-time software debugging. Get insights into eMIPS, P2V verification, and M2V code optimization.

Download Presentation

eMIPS Workstation Overview for FPGA Acceleration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eMIPS Project Overviewi.e...What if your next computer was an FPGA? Alessandro Forin Microsoft Research, Redmond April 3, 2008

  2. Outline • How FPGAs took over the world (architecture) • Easy pickings and novelty items (eMIPS) • Simulation, debugging, profiling, … (tools)

  3. Field Programmable Gate Arrays are.. ..essentially two, super-imposed memory planes: • The routing plane  which signal goes where • The logic plane  what logic function a node computes Just like RAM, they are • slower (1/10 roughly) • power hungry Nonetheless, they took over the world.

  4. One way is.. • UltraSparc CPU • 2.0 GHz, 4-issue,15 stages • Memory Hierarchy • 16KB/1MB L1/L2 • 32 Byte cacheline • 1/15 cycle L1/L2 latency • 160 cycle memory latency • Reconfigurable HW • Mapped to Xilinx Virtex-4 to obtain clock frequency Courtesy: K. Compton, U. Wisc.

  5. Another way is..

  6. The FPGA advantage • No code fetches • Fine-grained, spatially parallel • Data access: static prediction, reordering, variable width

  7. eMIPS

  8. Dynamically Extensible Processors • Using an FPGA, we have realized a (MIPS) processor that extends itself at runtime, using Extensions that are safe for multi-user operating systems • Applications: • Speedup execution using an Application-Specific CPU (M2V) • Unobtrusively monitor (real-time) software (P2V) • Loadable software debugging support (eBug) • Load/Unload peripherals at runtime, minimizing chip area • Load/Unload processor cores on demand • First release now available for non-commercial use

  9. The eMIPS “Workstation” • Motherboard: Xilinx ML401 evaluation board for the Virtex4 FPGA • eMIPS is on the FPGA, just add keyboard, mouse and disk!

  10. Application speedups (worst case) Video Games Spec2000 Real-Time Binaries with Hardware Acceleration Extended instructions are inserted into the Binaries. If the HW Extension is loaded the instruction executes and skips the basic block. Otherwise eMIPS interprets the instruction as a NOP and executes the block. Other code… Op78 sp,ra,10 New Instruction Lw ra,10(sp) Original Basic Jr ra Block Addiu sp,sp,18 Other code…

  11. Assertion Based Verification with P2V • Use the IEEE-standard hardware Property Specification Language (PSL) to verify C (real-time) programs • Implement it using a simulator, or in reconfigurable hardware • PSL-to-Verilog compiler: creates Extensions from PSL code • zero instrumentation code and zero overhead!

  12. Roles of eMIPS, P2V and GCC int foo(void){ REQ: device->CONTROL = 1; while(1) { ACK = device->STATUS; .... } } C GCC Elf-image Core Datapath debug info P2V Monitor Unit (MU) Bitfile PSL always(REQeventually(ACK==1))

  13. Peripheral Configuration State Machine Extensible Peripherals • Use the eMIPS extension slot for I/O peripherals • Safely load/unload peripherals on demand • Saves area, forward-compatible, bug fixes, … • Flexible interface solves perf. and atomicity issues Absent Not Configured Suspended: Power Mgt. Run

  14. Tools

  15. eBug: the extensible debugger • Safe, in-process, JTAG-style software debugger • Extensible in hw (watchpoints) • Extensible in sw (communication protocols) • Use P2V as a trigger

  16. Giano: Simulated Board ModelSim: eMIPS CPU Giano: Oracle Processor Debugging & Verification

  17. Optimizing the ISA with M2V Hardware Designers Software Developers Profiling M2V Compiled Code Top one or two Basic Blocks Basic Blocks Implemented as Hardware Extensions Original Binaries Modified to utilize Hardware Acceleration Same speed, half the area of hand-generated Verilog code

  18. M2V Role

  19. The BBTools • BBFIND finds the basic blocks in MIPS, PPC and ARM images (ELF and PE++) • A simulator (Giano) uses the BB info to generate profiling information. BBSORT+BBDUMP print it • BBMATCH applies the new instructions to the original executables • The simulator generates the new profile data Execution Counts of Individual Basic Blocks in XQuake, on the Xbox360

  20. Real-Time Simulation: Giano • Definition of Real-Time Simulation”: Realize a software system that matches the temporal behavior of the hardware+software system being simulated, using the same time-ordered sequence of inputs • Applicable to hybrid hw+sw simulators too • Requires: • Clock adaptation • I/O adaptation

  21. ARM: At91m63200 FPGA: vvp.dll PLI plug-in (VPI) NamedPipe client NamedPipe “GIANO” PLI plug-in (VPI) Optional: external devices (LabView) C-models ModelSim V-models Xilinx: Spartan3 CPU module test; always @(posedge clock) counter = counter + 1; Start = TheCounter->Value; ...compute... End = TheCounter->Value; MEM I/O Optional: Icarus Verilog Interpreter

  22. User Interface: Visio Graphs Atmel EB63 Evaluation Board

  23. Clock Adaptation Problem: Output a character every second • Timer too slow/fast?  Incorrect • Host load changes?  Erratic Solution: Rate-limit the clock using introspection and adaptation

  24. Rate-limiting the Clock • Every M (10**3) clock ticks spin idle for D microseconds • Every N (10**6) clock ticks check the actual frequency against the target frequency, adjust the delay D

  25. Adjusting the Delay factor

  26. Effect on IPS

  27. I/O adaptation Problem: 9600 baud serial line • From disk trace?  too fast • From user?  too slow • From real serial line?  depends Solution: • Link to rate-limited clock • Adapt using events and notifications

  28. Giano: Key points • Giano is the first Real-Time Simulation Framework for hardware-software co-development • Uses Microsoft Visio as the graphing and execution UI • Configurations are Platform XML files • Nodes in the graph are separate, user-defined DLLs • Lots of functionality pre-built into the base framework • 60+ working modules, 20+ systems, 4 years internal use • Release V2 available, free for academic use

  29. Credits • Full-timers Neil Pittman, Alessandro Forin • Interns Nathaniel Lynch, Behnam Neekzad, Ping Hang Cheung, Bharat Sukhwani, Lu Hong, Karl Meier, Giovanni Busonera http://research.microsoft.com/research/EmbeddedSystems

More Related