1 / 30

Hardware-Software Cosynthesis for Microcontrollers

Hardware-Software Cosynthesis for Microcontrollers. Cosyma – a software oriented cosynthesis approach. software-oriented: Initially everything is implemented on software, external hardware is only generated when timing constraints are violated. (except for basic I/O).

zia-chang
Download Presentation

Hardware-Software Cosynthesis for Microcontrollers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware-SoftwareCosynthesis forMicrocontrollers

  2. Cosyma – a software oriented cosynthesis approach software-oriented: Initially everything is implemented on software, external hardware is only generated when timing constraints are violated. (except for basic I/O)

  3. Cosyma – a software oriented cosynthesis approach software-oriented: Initially everything is implemented on software, external hardware is only generated when timing constraints are violated. (except for basic I/O) COSYMA = COSYnthesis for eMbedded Architecture

  4. Partitioning of software • Partitioning is based on the software, software must have constructs supporting • partitioning • C often used for embedded systems, but has no support for partitioning • A superset of C was defined, and called CX • These extensions of C includes: • timing: minimum and maximum delays and duration between C labels of a task • task concept • task intercommunication

  5. Partitioning problems • Goal of partitioning is to identify where constraints are violated. • Partitioning is done on different levels of granularity: • Coarse: task- and function level partitioning • Fine: basic block level and statement level • Using a finer granularity is more difficult because of: • Communication time overhead • Communication area overhead • Interlocks – waiting time • Compiler effects • The paper focuses on fine grain partitioning, with the basic block level as the • smallest partitioning unit.

  6. The Cosyma system • For partitioning purposes, the system description (in CX) is translated into an • internal graph representation. • Requirements on this graph representation are: • A complete representation of all input constructs shall be possible • User influence on syntactic structure • The representation shall support partitioning and generation of hardware description for parts moved to hardware • Estimation techniques shall be possible

  7. The Extended Syntax-graph • A typical control/dataflow graph cannot handle the two first requirements • ES graph: a syntax graph extended with: • symbol table • local data and control dependencies

  8. The Extended Syntax graph • Each identifier has a pointer to its definition • Each definition has pointers to all its instances • Cost of communication can be determined using the symbol table • The symbol table can be used to get an upper bound on communication cost

  9. def int i inst = f 1 lab1 for 5µs max inst inst inst inst *= = ≤ ++ f i i 1 i n i lab2 The Extended Syntax Graph from lab1 to lab2: max 5 us int i; f=1; lab1: for(i=1;i<n;i++) f *=I; lab2:

  10. Data Dependency int x; int y; int z; x = a + b y = b * 3 z = x + y

  11. def def def int int int = = = x z y + + * x y z b x a 3 b y inst inst inst int x; int y; int z; x = a + b y = b * 3 z = x + y

  12. def def def int int int = = = x z y + + * z y x b x a y b 3 inst inst inst int x; int y; int z; x = a + b y = b * 3 z = x + y The “symbol table” contains no information about data dependencies between operations, which is required to perform a scheduling on the graph in order to estimate HW execution times.

  13. def def def int int int = = = x z y + + * z x y a x b y b 3 inst a b 3 + * inst + x z y inst Basic Scheduling Block The ESG is overlayed with a second graph consisting of cross-linked blocks, Basic Scheduling Blocks x = a + b y = b * 3 z = x + y

  14. HW/SW partitioning on ES graph • Mark node that is to be moved to HW • Generate C code by re-constructing CX code preserved in ES graph • Insert HW/SW communication protocol, analyze dataflow in graph • Generate object code for SW and check constraints by simulating • Generate HW using Olympus synthesis system

  15. Iterative approach for partitioning Since it will take too much effort to evaluate the whole system when testing different partitioning schemes (synthesis, compilation and runtime analysis) an “inner loop” simulation based on a cost function is done. Initially, the chosen solution is non-feasible with regard to time constraints, and use a cost function with high penalty for solutions with run-times exceeding the time constraint ES Graph Partitioning Cost Estimation

  16. Hardware Extraction Process • A cost-function that favors implementations in hardware is used. This cost • function takes into consideration: • knowledge of synthesis • compilers • pre-defined HW libraries • Several specialized cost functions can work in parallel to extract different types of hardware.

  17. Hardware Extraction Process • A cost-function for extracting computational intensive parts • Use simulation and profiling for identifying these parts, determine: • Number of times a node has been executed • Potential speedup through HW synthesis • Communication time penalty

  18. Hardware Extraction Process • Speedup is estimated using • An operator table holding the execution times of the function units used in synthesis • Scheduling of the operations in the ES graph to estimate potential concurrency • Communication time penalty is estimated by • Dataflow analysis (# variables to be transferred) • # clock cycles for variable transfer

  19. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B)

  20. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) exponential weighting of runtime above the given constraints

  21. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) exponential weighting of runtime above the given constraints a(TC,TS) = sign(TC – TS)*exp[(TC – TS)/T] where TC = time constraint TS = resulting time needed by the hardware-software system T = constant factor

  22. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) tneff = effective hardware timing

  23. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) tneff = effective hardware timing tcom = communication overhead

  24. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) tneff = effective hardware timing tcom = communication overhead tHW-SW = time overlap between hardware and software in case of parallel execution (equals zero in the paper)

  25. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) tneff = effective hardware timing tcom = communication overhead tHW-SW = time overlap between hardware and software in case of parallel execution (equals zero in the paper) tSW = runtime when B is implemented in software

  26. The Cost Function Uses data from pre-processing stages (for example number of clock cycles for varible transfer to co-processor). When a basic block B is moved to hardware, the cost increment dc is defined as: dc(B)=a(TC,TS)*[tneff(B)+tcom(B) – tHW-SW(B) – tSW(B)]*It(B) tneff = effective hardware timing Number of times B was executed during profiling tcom = communication overhead tHW-SW = time overlap between hardware and software in case of parallel execution (equals zero in the paper) tSW = runtime when B is implemented in software

  27. CoP SW SW x = a + b y = b * 3 // B z = x + y 0 x=a+b x=a+b 2 Tcomm = 3 TC=5 THW-SW = 0 It(B) = 1 sto(b) 4 TSW = 10 sto(3) 6 y=b*3 y=b*3 tneff cost function as a function of tneff 8 sto(y) B 10 z=x+y 12 14 z=x+y 16

  28. Selection of BSBs • Large number of possible HW partitions possible • Estimate costs for adjacent BSBs in control flow to reduce preprocessing effort for communication costs. • Several BSBs can be moved to HW, to avoid redundant variable exchange the communication between BSBs must be considered. • The finer granularity, the larger impact has communication

  29. Results from experiments • In the examples, speedup was 1.3 – 3 • Communication overhead more important factor than number of iterations • Important to consider compiler optimization

  30. About this paper • When written, only one other partitioning system supports automatic partitioning • CX - An extension to C for system design • Extended syntax graph for system analysis and partitioning • Hardware extraction process using cost function • Examples showing importance of communication overhead

More Related