1 / 38

HW/SW Co-design

HW/SW Co-design. Lecture 4: Lab 2 – Passive HW Accelerator Design. Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE Dept, NTHU. Outline. Introduction to AMBA Bus System Passive Hardware Design Interrupt Service Routine Environment Configuration

ilyssa
Download Presentation

HW/SW Co-design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HW/SW Co-design Lecture 4: Lab 2 – Passive HW Accelerator Design Course material designed by Professor Yarsun Hsu, EE Dept, NTHU RA: Yi-Chiun Fang, EE Dept, NTHU

  2. Outline • Introduction to AMBA Bus System • Passive Hardware Design • Interrupt Service Routine • Environment Configuration • Co-designed System with GHDL Simulation • Co-designed System on FPGA

  3. INTRODUCTION TO AMBA BUS SYSTEM

  4. AMBA 2.0 Bus System (1/7) • Established by ARM • Advanced High-performance Bus (AHB) • For high-performance, high clock frequency system modules such as embedded processor, DMA controller, and memory controller • Advanced Peripheral Bus (APB) • Optimized for minimal power consumption and reduced interface complexity to support peripheral functions • For more details, please refer to the following documents • AMBA 2.0 Specification • Introduction to AMBA Bus System • GRLIB AHBCTRL - AMBA AHB controller with plug&play support

  5. AMBA 2.0 Bus System (2/7) Slave on AHB The only master on APB

  6. AMBA 2.0 Bus System (3/7) • AMBA AHB is designed to be used with a central multiplexor interconnection scheme • Avoids tri-state bus

  7. AMBA 2.0 Bus System (4/7) • An AHB transfer consists of two distinct sections • The address phase, which lasts only a single cycle • The data phase, which may require several cycles • This is achieved using the HREADY signal

  8. AMBA 2.0 Bus System (5/7) • A slave may insert wait states into any transfer • For write operations, the bus master will hold the data stable throughout the extended cycles • For read transfers, the slave does not have to provide valid data until the transfer is about to complete wait states

  9. AMBA 2.0 Bus System (6/7) • GRLIB implements AMBA AHB with slight modifications • Please refer to the GRLIB User's Manual and GRLIB IP Cores Manual for detailed information

  10. AMBA 2.0 Bus System (7/7) • The GRLIB implementation of AHB includes a mechanism to provide plug&play support • The implementation is located at grlib-gpl-1.0.19-b3188/lib/grlib/amba/ • The configuration record from each AHB unit is sent to the AHB bus controller via the HCONFIG signal identification of attached units interrupt routing address mapping of slaves type ahb_config_type is array (0 to NAHBCFG-1) of amba_config_word;

  11. PASSIVE HARDWARE DESIGN

  12. Passive HW Accelerators • The accelerator (bus slave) does not actively send signals to the bus • It only responds to the master • The master gives commands to the slave via its control registers and probes its status registers master slave

  13. Passive 1-D IDCT HW Acc. (1/4) • A simple 2-stage design • Gate delay • Stage 1: ~1 mult • Stage 2: ~3 add • Action register • Write ‘1’ to start, resetto 0 automatically by theaccelerator when done • Mode register • Row/column mode • No wait states • Immediate response action mode

  14. Passive 1-D IDCT HW Acc. (2/4) • Data packing • Since the 8x8 blocks are of type short (16-bit), each value occupies only half of the data bus (32-bit) • We pack two values together to increase data bus utilization and reduce the communication overhead • The action bit and mode bit are also packed together 31 2 1 0 UNUSED mode action

  15. Passive 1-D IDCT HW Acc. (3/4) • 1-D IDCT calculation • STEP1: Write Y registers (4 transfers) • STEP2: Write mode bit & action bit • STEP3: Poll the action bit • STEP4: Read x registers after action bit reset

  16. Passive 1-D IDCT HW Acc. (4/4) static void hw_idct_1d(short *dst, short *src, unsigned int mode) { long *long_ptr = (long *)src; Y_array_base[0] = long_ptr[0]; Y_array_base[1] = long_ptr[1]; ... *c_reg = (long)((mode << 1) | 0x1); while (*c_reg & 0x1){ /*busy waiting loop*/ } dst[ 0] = ((short *)x_array_base)[0]; dst[ 8] = ((short *)x_array_base)[1]; ... }

  17. INTERRUPT SERVICE ROUTINE

  18. GRLIB GPTIMER (1/2) • General Purpose Timer Unit • Timers are present in almost any electronic device which needs timing functions (e.g. timekeeping & time measurement) • Acts as a slave on AMBA APB • Provides a common decrementing prescaler (clocked by the system clock) and decrementing timers • Capable of assertinginterrupt on timerunderflow • We initialize timer 2 for1ms resolution (i.e. aninterrupt will be assertedevery 1ms)

  19. GRLIB GPTIMER (2/2) • Please refer to the GRLIB IP Cores Manual for detailed information

  20. eCos ISR (1/3) • When an interrupt occurs, the processor jumps to a specific address for execution of the Interrupt Service Routine (ISR) • One of the key concerns in embedded systems with respect to interrupts is latency, which is the interval of time from when an interrupt occurs until the ISR begins to execute interrupt latency

  21. eCos ISR (2/3) • Basic API for implementing ISR • Please refer to the eCos Reference Manual for detailed information #include <cyg/kernel/kapi.h> void cyg_interrupt_create(cyg_vector_t vector, cyg_priority_t priority, cyg_addrword_t data, cyg_ISR_t* isr, cyg_DSR_t* dsr, cyg_handle_t* handle, cyg_interrupt* intr); void cyg_interrupt_delete(cyg_handle_t interrupt); void cyg_interrupt_attach(cyg_handle_t interrupt); void cyg_interrupt_detach(cyg_handle_t interrupt); void cyg_interrupt_acknowledge(cyg_vector_t vector); void cyg_interrupt_mask(cyg_vector_t vector); void cyg_interrupt_unmask(cyg_vector_t vector);

  22. eCos ISR (3/3) • An ISR is a C function which takes the following form • An ISR should complete as soon as possible cyg_uint32 isr_function(cyg_vector_t vector, cyg_addrword_t data) { ... /* do the service routine */ return CYG_ISR_HANDLED; }

  23. Program Profiling (1/2) • We use GPTIMER for time measurment • Every time the timer asserts an interrupt, the timer ISR will increase a global variable time_tick cyg_uint32 timer_isr(cyg_vector_t vector, cyg_addrword_t data) { unsigned long *time_tick = (unsigned long *) data; (*time_tick)++; cyg_interrupt_acknowledge(vector); return CYG_ISR_HANDLED; }

  24. Program Profiling (2/2) • We record the latency of every function block by monitoring the time_tick variable void func() { unsigned long local_timer = time_tick; ... time_elapsed += (time_tick - local_timer); }

  25. ENVIRONMENT CONFIGURATION

  26. Build SW Application • Copy the files in lab_pkg/lab2/sw to your original Lab 1 directory • Replace the Makefile and modify the path for ECOSDIR in Makefile • Type “make” to build • -D_HW_ACC_ flag will link the co-designed version of hw_idct_2d() in idct_hw.c with the testbench • Without this flag, hw_idct_2d() will be identical to sw_idct_2d() • -D_PROFILING_ flag will enable profiling using timer interrupt, and report the results in the end

  27. Install IDCT Accelerator • Copy lab_pkg/lab2/hw/devices.vhd to grlib-gpl-1.0.19-b3188/lib/grlib/amba/ and replace the original file • Copy lab_pkg/lab2/hw/libs.txt and the whole lab_pkg/lab2/hw/esw folder to grlib-gpl-1.0.19-b3188/lib/ • The 1-D IDCT passive accelerator is located at lab_pkg/lab2/hw/esw/idct_acc/idct_1x8.vhd • Copy lab_pkg/lab2/hw/leon3mp.vhd to grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ and replace the original file

  28. CO-DESIGNED SYSTEM WITH GHDL SIMULATION

  29. GHDL Simulation (1/6) • We compile our program as a virtual SDRAM for LEON3 processor • LEON3 will fetch the instructions and perform the corresponding operations • All the hardware signals can be recorded and dumped by GHDL

  30. GHDL Simulation (2/6) • In order to perform GHDL simulation, we disallow our program to link with eCos • Remove -D__ECOS &-I$(ECOSDIR)/include from CFLAGS • Remove -Ttarget.ld, -nostdlib, &-L$(ECOSDIR)/lib from LFLAGS • Remove –D_PROFILING_ flag • You can remove -D_VERBOSE_ for faster simulation • You can modify the NUM_BLKS macro in idct_test.c to reduce the number of testbench iterations • Type “make” to build • You should see a file named sdram.srec

  31. GHDL Simulation (3/6) • Start Cygwin • cd grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ • make distclean • make soft • Copy sdram.srec webuilt into this directoryand replace theoriginal one • make ghdl • You can check forsyntax errors throughGHDL

  32. GHDL Simulation (4/6) • Type “./testbench.exe --vcd=waveform.vcd” after compilation to begin simulation • You should see an AHB slave with “Unknown vendor” appear, which is our IDCT accelerator

  33. GHDL Simulation (5/6) • The dump file waveform.vcd can be viewed on-the-fly using GTKWave • Drag waveform.vcd and drop it over the gtkwave.exe icon to open • You can also use Windows cmd to open • “File → Reload Waveform” in GTKWave to update the dump file

  34. GHDL Simulation (6/6) stage1 stage2 addr phase data phase probecontrol reg

  35. CO-DESIGNED SYSTEM ON FPGA

  36. Build FPGA Bitstream (1/2) • Type “make ise | tee ise_log” under grlib-gpl-1.0.19-b3188/designs/leon3-gr-xc3s-1500/ after you install the accelerator • It is strongly suggested that you verify the hardware with GHDL simulation first • It is also suggested that you take a look at ise_log for more information • Configure your FPGA with leon3mp.bit after generating the bitstream

  37. Build FPGA Bitstream (2/2) • After entering GRMON, check the system configuration using “info sys” • You should see a device with “Unknown vendor” appear

  38. Profiling Results • Build the program with -D_PROFILING_ flag on • Compare the computation results of sw_idct_2d() and hw_idct_2d() • Compare thecomputationresults withand without-D_VERBOSE_flag

More Related