1 / 42

Processor Architectures and Program Mapping

Processor Architectures and Program Mapping. Application domain specific processors (ADSP or ASIP) 5kk10 TU/e Henk Corporaal Jef van Meerbergen Bart Mesman. Application domain specific processors (ADSP or ASIP). DSP. Programmable CPU. Programmable DSP.

Download Presentation

Processor Architectures and Program Mapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processor Architectures and Program Mapping Application domain specific processors (ADSP or ASIP) 5kk10 TU/e Henk Corporaal Jef van Meerbergen Bart Mesman

  2. Application domain specific processors (ADSP or ASIP) DSP Programmable CPU Programmable DSP Application domain specific Application specific processor flexibility efficiency Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  3. implementation Appl. domain GP ADSP Appl. domain implementation Application domain specific processors (ADSP or ASIP) • takes a well defined application domain as a starting point • exploits characteristics of the domain (computation kernels) • still programmable within the domain • e.g. MPEG2 coding uses 8*8 DCT transform, DECT, GSM etc ... performance: clock speed + ILP ILP + tuning to domain flexible dev. (new apps.) cost effective (high volume) problems - specification manual design, - design time and effort large effort => synthesized cores

  4. www.adelantetech.com Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  5. Outline • design process • retargetable code generation (problem statement) • ADSP/VLIW architectures (Mistral 2 /A|RT designer) • instructive demo (Adelante) • application examples • low power aspects (Mistral 2 /A|RT designer) • discussion • conclusion Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  6. OK? more appl.? Design process processor- model e.g. VLIW with shared RFs application(s) instance parameters 3 phases 1. exploration 2. hw design (layout) + processing 3. design appl. sw SW (code generation) HW design Estimations nsec/cycle, area, power/instr Estimations cycles/alg occupation Fast, accurate and early feedback no yes yes no go to phase 2 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  7. Problem statement A compiler is retargetable if it can generate code for a ‘new’ processor architecture specified in a machine description file. A guarded register transfer pattern (GRTP) is a register transfer pattern (RTP) together with the control bits of the instruction word that control the RTP. a: = b + c | instr = xxxx0101 GRTPs contain all inter-RT-conflict information. Instruction set extraction (ISE) is the process of generating all possible GRTPs for a specific processor. Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  8. Problem statement Algorithm spec Processor spec (instance) in ch 4 this is part of the code generator FE ISE CDFG GRTP Code Generation Machinecode Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  9. Example: Simple processor [Leupers] I.(12:5) Inp RAM I.(20:13) I.(4) +1 PC I.(3:2) IM I.(1:0) I.(20:0) REG outp Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  10. Example: Simple processor [Leupers] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  11. ASIP/VLIW architectures A|RT designer template as an example (= set of rules, a model) • Differences with VLIW processors of ch. 4 • 1. // FUs • ASUs = complex appl. Spec. FUs (beyond subword //) • e.g. biquad, median, DCT etc … • larger grainsize, more heterogeneous, more pipelines • 2. Rfiles • many Rfiles (>5 vs 1 or 2) • limited # ports (3 vs 15) • limited size (<16 vs. 128) • 3. Issue slots • all in parallel vs. 5 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  12. RF5 RF7 RF6 RF8 RF1 RF3 RF2 RF4 FU3 FU4 FU1 FU2 flags IR3 IR4 IR1 IR2 Instruction memory Con- trol Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  13. read address RF 1 control FU mux 1 write address RF 1 read address RF 2 mux 2 write address RF 2 output drivers ASIP/VLIW architectures • Additional characteristics of the A|RT designer template • interconnect network: busses + input multiplexers • mux control is part of the instruction • control can change every clock cycle • network can be incomplete • busses can be merged • memories are modeled as FUs • separate data in and data out • 2 inputs (data in and address) and 1 output • Each FU can generate one or more flags • instruction format (per issue slot) Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  14. 19 10 0 9 mux 2 mux 3 read RF1 write RF1 read RF2 write RF2 ALU instr. read RF3 write RF3 read RF4 write RF4 MAC instr. ASIP/VLIW architectures: example RF1 RF2 RF3 RF4 ALU MAC bus1 bus2 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  15. ASIP/VLIW architectures : example Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  16. OK? assign ( a+b, ALU, fu_alu1) assign ( a+_, ALU, fu_alu2) assign ( _+_, ALU, fu_alu3) ASIP/VLIW architectures: design flow Algorithm spec Datapath synthesis RF1 : x = RF2 : y, RF3 : z | ALU = ADD Inmux = bus2 Change pragmas RTs Controller synthesis Estimations area, power, timing no VLIW makes relatively simple code selection possible yes Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  17. ASIP/VLIW architectures: feedback Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  18. Outline • design process • retargetable code generation (problem statement) • ASIP/VLIW architectures (Mistral 2 /A|RT designer) • instructive demo (Adelante) • application examples • low power aspects (Mistral 2 /A|RT designer) • discussion • conclusion Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  19. Application examples: adaptive filter Minimizes the difference between x and e (reference signal) x y filter c0 c1 c63 Control unit - r e • Many applications are possible • echo cancelling for TV • e = flyback signal (known without echoes) • automatic equalization of cables in data transmission • acoustic echo cancelling Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  20. Application examples: adaptive filter speech x speaker y filter c0 c1 c63 microphone r Control unit - Speech + noise e noise Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  21. Application examples: adaptive filter noise (e.g. radio) Hearing aid x y filter c0 c1 c63 r Control unit - Speech + noise e speech Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  22. Application examples: adaptive filter x[n] x[n-1] x[n-i] x[n-63] Z-1 Z-1 Z-1 c0 c1 ci c63 A0 * * An * A1 * Ai t[n] S63[n] S0[n] S1[n] Si[n] * ê [n] mu + Z-1 r[n] - e[n] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  23. Application examples: adaptive filter x[n-i] Ai Ci[n] Ci[n-1] Z-1 * + t[n] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  24. Application examples: adaptive filter sum[i] t r x@i r * c[i]@1 + * w + sum[i+1] Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  25. Application examples: adaptive filter implementation 1 2 1 1 1 2 2 2 3 RAM ALU MULT ACU ROM bus1 bus2 266 clock cycles 1.1 mm2 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  26. Application examples: adaptive filter implementation 2 4 1 5 2 5 5 RAM ALU ACU ROM bus1 bus2 2250 clock cycles 0.7 mm2 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  27. Application examples: adaptive filter implementation 3 1 2 2 2 1 1 1 2 1 1 1 1 RAM1 ACU1 ALU MULT RAM2 ROM ACU2 202 clock cycles 1.4 mm2 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  28. clock cycles 2000 1000 area (mm2) 1 2 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  29. Outline • design process • retargetable code generation (problem statement) • ADSP/VLIW architectures (Mistral 2 /A|RT designer) • instructive demo (Adelante) • application examples • low power aspects (Mistral 2 /A|RT designer) • discussion • conclusion Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  30. Implementation Independent Design Database Low power aspects • Estimation area + speed power Mistral2 Estimation Database Architecture Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  31. GSM viterbi decoder : default solution EXU ACTIV AREA POWER alu_1 96% 3469 46196 romctrl_1 48% 39 259 acu_1 26% 327 1209 ipb_1 5% 131 105 opb_1 23% 1804 5801 ctrl 9821 135035 total 15591 188605 • controller responsible for 70% of power consumption • maximum resource-sharing • heavy decision-making : “main” loop with 16 metrics-computations per iteration • EXU-numbers include Registers for local storage 13750 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  32. GSM viterbi decoder : no loop-folding EXU ACTIV AREA POWER alu_1 92% 3411 45073 romctrl_1 45% 39 255 acu_1 25% 294 1087 ipb_1 5% 107 86 opb_1 22% 1661 5340 ctrl 4919 70087 total 10431 121928 • area down by 33% • power down by 35% • next step: reduce # of program-steps with second ALU 14247 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  33. GSM viterbi decoder : 2 ALU’s EXU ACTIV AREA POWER alu_1 69% 1797 12248 alu_2 65% 1393 8916 romctrl_1 67% 39 255 acu_1 37% 294 1087 ipb_1 8% 149 119 opb_1 33% 2136 6871 ctrl 8957 87235 total 14766 116731 9739 • cycle count down 30% • area up 42% • power down by 5% • next step: introduce ASU to reduce ALU-load Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  34. GSM viterbi decoder : 1 x ACS-ASU func ACS ( M1, M2, d ) MS, MS8 = begin MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi; MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi; end; = EXU ACTIV AREA POWER alu_1 20% 261 105 acs_asu_1 83% 2382 3816 or_asu_1 10% 611 122 romctrl_1 16% 65 21 acu_1 36% 294 205 ipb_1 20% 107 43 opb_1 11% 163 35 ctrl 1864 3597 total 5747 7944 1930 • cycle count down 5X • power down20X! Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  35. GSM viterbi decoder : 4 x ACS-ASU EXU ACTIV AREA POWER alu_1 94% 243 97 acs_asu_1 95% 1041 420 acs_asu_2 95% 1041 420 acs_asu_3 95% 1041 420 acs_asu_4 95% 1041 420 split_asu_1 47% 90 18 or_asu_1 47% 592 118 romctrl_1 28% 48 6 acu_1 98% 212 85 ipb_1 23% 60 6 opb_1 50% 369 80 ctrl 1306 555 total 7084 2645 425 • cycle count down another 5X • area up 23% • power downanother 3X! Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  36. Implementation Independent Design Database GSM viterbi example : summary Mistral2 72x ! Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  37. OK? OK? more appl.? Discussion: phase 3 processor- model application(s) application(s) SW (code generation) HW design SW (code generation) Freeze processor model no no no yes yes no yes Application software development: constraint driven compilation Exploration phase Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  38. Discussion: problems with VLIWs code size and instruction bandwidth • code compaction = reduce code size after scheduling • possible compaction ratio ? • e.g. p0 = 0.9 and p1 = 0.1 • information content (entropy) = - pi log2 pi = 0.47 • maximum compression factor  2 • control parallelism during scheduling = switch between • different processor models (10% of code = 90% runtime) • architecture • reduce number of control bits for operand addresses • e.g. 128 reg (TM) -> 28 bits/issue slot for addresses only • => use stacks and fifos Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  39. RF2 RF1 RF3 RF4 FU3 FU4 FU1 FU2 flags IR3 IR4 IR1 IR2 Instruction memory Con- trol Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  40. Discussion: clustered VLIW architectures RF1 RF2 RF3 RF4 FU1 FU2 FU3 FU4 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  41. Conclusions • ASIPs provide efficient solutions for well-defined application domains (2 orders of magnitude higher efficiency). • The methodology is interesting for IP creation. • The key problem is retargetable compilation. • A (distributed) VLIW model is a good compromise between HW and SW. • Although an automatic process can generate a default solution, the process usually is interactive and iterative for efficiency reasons. The key is fast and accurate feedback. Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

  42. Imagine assignment • For the coming 3 weeks: • Install the tools (VisualC package will be sent by mail) • Read the beginners’ guide • Experiment with the compiler on a few examples • http://www.ics.ele.tue.nl/~hfatemi/5kk10/ • Further information on Imagine: • www.cva.stanford.edu/projects/imagine/ Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

More Related