1 / 25

CprE / ComS 583 Reconfigurable Computing

CprE / ComS 583 Reconfigurable Computing. Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW Codesign. Outline. HW/SW Codesign Motivation Specification Partitioning Automation. Hardware/Software Codesign.

Download Presentation

CprE / ComS 583 Reconfigurable Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CprE / ComS 583Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW Codesign

  2. Outline • HW/SW Codesign • Motivation • Specification • Partitioning • Automation CprE 583 – Reconfigurable Computing

  3. Hardware/Software Codesign • Definition 1 – the concurrent and co-operative design of hardware and software components of an embedded system • Definition 2 – A design methodology supporting the cooperative and concurrent development of hardware and software (co-specification, co-development, and co-verification) in order to achieve shared functionality and performance goals for a combined system [MicGup97A] CprE 583 – Reconfigurable Computing

  4. Motivation • Not possible to put everything in hardware due to limited resources • Some code more appropriate for sequential implementation • Desirable to allow for parallelization, serialization • Possible to modify existing compilers to perform the task CprE 583 – Reconfigurable Computing

  5. Why put CPUs on FPGAs? • Shrink a board to a chip • What CPUs do best: • Irregular code • Code that takes advantage of a highly optimized datapath • What FPGAs do best: • Data-oriented computations • Computations with local control CprE 583 – Reconfigurable Computing

  6. Memory FPGA Computational Model • Most recent work addressing this problem assumes relatively slow bus interface • FPGA has direct interface to memory in this model Memory bus General- Purpose Processor CprE 583 – Reconfigurable Computing

  7. Hardware/Software Partitioning if (foo < 8) { for (i=0; i<N; i++) x[i] = y[i]*z[i]; } CPU HW Accelerator CprE 583 – Reconfigurable Computing

  8. Methodology • Separation between function, and communication • Unified refinable formal specification model • Facilitates system specification • Implementation independent • Eases HW/SW trade-off evaluation and partitioning • From a more practical perspective: • Measure the application • Identify what to put onto the accelerator • Build interfaces CprE 583 – Reconfigurable Computing

  9. System-Level Methodology Informal Specification, Constraints Component profiling System model Performance evaluation Architecture design HW/SW implementation Fail Test Prototype Success Implementation CprE 583 – Reconfigurable Computing

  10. Concurrency • Concurrent applications provide the most speedup No data dependencies CPU if (a > b) ... x[i] = y[i] * z[i] accelerator CprE 583 – Reconfigurable Computing

  11. Process 1 Process 2 Process 3 Partitioning • Can divide the application into several processes that run concurrently • Process partitioning exposes opportunities for parallelism if (i>b) … for (i=0; i<N; i++) … for (j=0; j<N; j++) ... CprE 583 – Reconfigurable Computing

  12. process (a, b, c) in port a, b; out port c; { read(a); … write(c); } Specification Automating System Partitioning Line () { a = … … detach } Interface • Good partitioning mechanism: • Minimize communication across bus • Allows parallelism  both hardware (FPGA) and processor operating concurrently • Near peak processor utilization at all times (performing useful work) Partition Model FPGA Capture Synthesize Processor CprE 583 – Reconfigurable Computing

  13. Partitioning Algorithms Software Hardware • Assume everything initially in software • Select task for swapping • Migrate to hardware and evaluate cost • Timing, hardware resources, program and data storage, synchronization overhead • Cost evaluation and move evaluation similar to what we’ve seen regarding mincut and simulated annealing task List of tasks List of tasks CprE 583 – Reconfigurable Computing

  14. Multi-threaded Systems • Single thread: • Multi-thread: CprE 583 – Reconfigurable Computing

  15. Single threaded: Find longest possible execution path Multi-threaded with no synchronization: Find the longest of several execution paths Multi-threaded with synchronization: Find the worst-case synchronization conditions Performance Analysis CprE 583 – Reconfigurable Computing

  16. Multi-threaded Performance Analysis • Synchronization causes the delay along one path to affect the delay along another ta tb synchronization point tc td Delay = max(ta, tb) + td CprE 583 – Reconfigurable Computing

  17. Control • Need to signal between CPU and accelerator • Data ready • Complete • Implementations: • Shared memory • Handshake • If computation time is very predictable, a simpler communication scheme may be possible CprE 583 – Reconfigurable Computing

  18. Application Program Operating System I/O driver I/O bus Communication Levels Send, Receive, Wait • Easier to program at application level • (send, receive, wait) but difficult to predict • More difficult to specify at low level • Difficult to extract from program but timing and resources easier to predict Application hardware (custom) Register reads/writes I/O driver Interrupt service Bus transactions I/O bus Interrupts CprE 583 – Reconfigurable Computing

  19. Other Interface Models • Synchronization through a FIFO • FIFO can be implemented either in hardware or in software • Effectively reconfigure hardware (FPGA) to allocate buffer space as needed • Interrupts used for software version of FIFO r3 p1 p2 p3 r2 d1 FPGA Control/Data FIFO d3 d2 CprE 583 – Reconfigurable Computing

  20. Debugging • Hard to test a CPU/accelerator system: • Hard to control and observe the accelerator without the CPU • Software on CPU may have bugs • Build separate test benches for CPU code, accelerator • Test integrated system after components have been tested CprE 583 – Reconfigurable Computing

  21. Formal Verification Compilers Partitioning Simulation POLIS Codesign Methodology ................ ESTEREL Graphical EFSM CFSMs Sw Synthesis Hw Synthesis Intfc + RTOS Synthesis Sw Code + RTOS Logic Netlist Rapid prototyping CprE 583 – Reconfigurable Computing

  22. Codesign Finite State Machines • POLIS uses an FSM model for • Uncommitted • Synthesizable • Verifiable Control-dominated HW/SW specification • Translators from • State diagrams, • Esterel, ECL, ReactiveJava • HDLs Into a single FSM-based language CprE 583 – Reconfigurable Computing

  23. CFSM behavior • Four-phase cycle: • Idle • Detect input events • Execute one transition • Emit output events • Software response could take a long time: • Unbounded delay assumption • Need efficient hw/sw communication primitive: • Event-based point-to-point communication CprE 583 – Reconfigurable Computing

  24. Network of CFSMs • Globally Asynchronous, Locally Synchronous (GALS) model F B=>C C=>F G C=>G C=>G F^(G==1) C C=>A CFSM2 CFSM1 CFSM1 CFSM2 C A C=>B B C=>B (A==0)=>B CFSM3 CprE 583 – Reconfigurable Computing

  25. Summary • Hardware/software codesign complicated and limited by performance estimates • Algorithms not generally as good as human partitioning • Other interesting issues include dual processors, special memory interfaces • Will likely evolve at faster rate as compilers evolve CprE 583 – Reconfigurable Computing

More Related