1 / 19

FPGA Multi-core

FPGA Multi-core. Mid-project presentation. The Project Asymmetric FPGA-loaded hardware accelerators for FPGA-enhanced CPU systems with Linux. Performed by: Avi Werner William Backshi Instructor: Evgeny Fiksman Duration: 1 year (2 semesters). HS DSL. 30/03/2009. RMI Processor.

Download Presentation

FPGA Multi-core

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FPGA Multi-core

  2. Mid-project presentation The Project Asymmetric FPGA-loaded hardware accelerators for FPGA-enhanced CPU systems with Linux Performed by: Avi Werner William Backshi Instructor: EvgenyFiksman Duration: 1 year (2 semesters) HS DSL 30/03/2009

  3. RMI Processor

  4. RMI – SW Programming Model

  5. RMI Processor - RMIOS

  6. Agenda • Project description • Design considerations and schematics • System diagram and functionality • Preparing the demo • Planned future progress

  7. Project definition • An FPGA-based system. • Asymmetric multiprocessor system, with Master CPU and several slave Accelerators (modified softcore CPUs with RAM) with same or different OpCode. • Master CPU running single-processor Linux OS, with the Accelerators functionality provided to the applications in OS by driver API.

  8. The Platform • Platform • ML310 with PPC405 • Accelerators • Based on uBlaze soft-core microprocessors. • Controllers • IRQ controller for each core. “Accelerator” refers to microprocessor + IRQ generator + RAM

  9. Project Progress • Theoretical research • Found and read articles on HW accelerators, both of the faculty staff and external (CELL – IBM, etc) • Met with most of MATRICS group, checking their interest in our platform and possible demands • Met with Systems Dpt. Members in IBM (Muli Ben-Yehuda) for a concept review. • System architecture has undergone significant changes. • Practical achievements – attempt to load Linux on ML310 • Compiled kernel for PPC-405 with ML310 support (no PCI support). • Booted ML310 from CF with Xilinx pre-loaded Linux. • Introduced additional hardware into FPGA, tested liveness. • Practical achievements – creating HW system platform • Moved to Xilinx 10.1 to get a single system bus (PLB v.4.6) with multi-port memory. • Created a template for Accelerator core (IRQ Generator and microprocessor). • Designed interconnect topology. • Connected the devices on HW level, tested system liveness and independency.

  10. HW Design considerations • Scalability – the design is CPU-independent. • Accelerator working with interrupts – no polling (improved performance). • OS not working with interrupts – generic HW compatibility and scalability (polling IRQ generators). • Separate register space – not using main memory for flags / device data / etc. • Single cycle transaction for checking / setting accelerator status. • Data Mover stub init includes chunk size – no character recognition needed.

  11. Accelerator Schematics PLB v.4.6 Slave Master Accelerator IRQ Generator General Purpose Registers CPU (uBlaze) IRQ Instruction bus Data bus Data & Instr. Dual port RAM MEM Controller MEM Controller

  12. HW Design Schematics MMU DDR MEM PLB v.4.6 bus Accelerator Accelerator Accelerator PPC Data & Instr MEM Data & Instr MEM Data & Instr MEM

  13. Current System layer DDR MEM Memory test demo LED Accelerator demo Manual execution MMU Accelerated Software platform FPGA Accelerator Instr MEM & Data MEM PPC 405 Software Stub (Data mover & executer) Manual execution : we can’t load any executable into the DDR without JTAG – since we don’t have OS. Thus we have to load it manually, and setup and execute stub manually.

  14. Complete System layer Linux (Debian) DDR MEM Driver Virtual communication Layer (SW) MMU Accelerated Software platform FPGA Accelerator Instr MEM & Data MEM PPC 405 Software Stub (Data mover & executer)

  15. System Functionality • Functionality • HW is loaded on FPGA, Demo application (in the future - Linux kernel) runs on central PPC core, accelerators are preloaded with client software stub. • SW driver is loaded in the memory (in kernel - using insmod command). • Accelerator-aware SW is executed (in kernel - communicates with the driver API). • To commit a job for specific accelerator, the SW initializes the registers of the accelerator’s IRQ controller and sets the “run” flag in the status register. • Client stub runs in idle loop until an IRQ controller of the accelerator issues an interrupt - initialized by driver code running on PPC core. • The stub reads IRQ controller registers that initialize the Data Mover (in the 1st stage - with start address and length of code). • Data Mover sets a flag in the IRQ generator status register, that signals a working accelerator core. • Data Mover initializes transactions with the main memory until all the code segment has been brought and passes control to the 1st byte of the code segment. • The target code includes “rtid” instruction to return control to Data Mover after execution, it finishes and the inserted “rtid” passes control back to Data Mover stub. • Data Mover changes the status register of IRQ generator to “complete”, and returns to idle loop (the stub has a possibility to support returning resulting data structures to the main memory).

  16. Preparing Accelerator SW Compilation of accelerator target code, with execution-only segment (there is no data segment – data inserted inline). Target code should be compiled with Program starting address = 0x1000, set via Compiler options, using Default linker script. Insert in the end – call to a “return” function with address that is taken from 0xFFC: asm("andi r1, r1, 0x0;\ lwi r15, r1, 0xFFC;\ rtsd r15,0;"); Open Xilinx EDK Shell, run for converting ELF to binary code: mb-objcopy -O binary --remove-section=.stab --remove-section=.stabstr executable.elf target.bin

  17. Preparing the system Download bitstream to FPGA (PPC code and uBlaze stub). Launch XMD on PPC core. Download target accelerator BIN to DRAM as data: dow –data target.bin 0xSTART_ADDR Set IRQ Generator parameters: Base address – 0xSTART_ADDR + 0x1000. Length of BIN in DRAM. Run bit. Set run bit again, if you liked it.

  18. Planned future progress • Load Linux on the platform. • Update the stub to allow data passing. • Finish writing the driver API for Linux. • Write additional demo application for uBlaze. • Write demo application for PPC (Linux).

  19. Backup slides • Hidden

More Related