1 / 54

Design Automation Methodologies for Extensible Processors Platform

Design Automation Methodologies for Extensible Processors Platform. Newton Cheung School of Computer Science & Engineering The University of New South Wales Sydney, Australia. Outline. System-On-Chips Design Challenges Extensible Processors Platform Background – Xtensa

noura
Download Presentation

Design Automation Methodologies for Extensible Processors Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design Automation Methodologies for Extensible Processors Platform Newton Cheung School of Computer Science & Engineering The University of New South Wales Sydney, Australia

  2. Outline • System-On-Chips Design Challenges • Extensible Processors Platform • Background – Xtensa • Problems in Extensible Processors Platform • The Goal of the research • Proposed Solution • INSIDE • MINCE • Conclusions ncheung@cse.unsw.edu.au

  3. SoC Design Challenges • Application functionality • Ever changing nature of embedded product Increasing software content Flexibility • Power consumption • Chip area • Cost Customizing the architecture to the specific application (application domain) Efficiency ncheung@cse.unsw.edu.au

  4. Flexibility vs. Efficiency (Energy) Application Specific Integrated Circuit (ASIC) 500-1000 MIPS/mw Application Specific Instruction-set Processor (ASIP) 50-100 MIPS/mw Energy Efficiency Flexibility Flexibility Domain Specific Processor (DSP) 1-10 MIPS/mw General Purpose Processor 0.1-1 MIPS/mw Source: Fei Sun @ ICCAD 2002 ncheung@cse.unsw.edu.au

  5. Flexibility vs. Efficiency (Performance) Performance ASIC ASIP FPGA GPP Flexibility ncheung@cse.unsw.edu.au

  6. General Purpose Processor Application Specific Processor ncheung@cse.unsw.edu.au

  7. Optimized Processor Design (ASIP) • Cost • Small size • Performance • Application extensions • Productivity • Rapid hardware and software ncheung@cse.unsw.edu.au

  8. Extensible Processors Platform • Represents the state-of-the-art in application specific instruction-set processor (ASIP) • Consists of a base processor containing a base instruction set, plus the capability to customize their architecture to replace computationally intensive code segments • The goal of designing extensible processors is typically to maximize the performance of an application, while meeting design constraints ncheung@cse.unsw.edu.au

  9. I:Instruction Fetch R:Instruction Decode / Register Fetch E: Execute / Effective Address M: Memory Access / Branch Complete W: Write Back Instruction ROM / RAM / Cache Decode / General Registers ALU / Address Generation Data ROM / RAM / Cache Write Back / Exception Resolution Custom / Specific Instruction Predefined Block Register (i.e. 64-bit) Predefined Blocks (e.g. DSP – Digital Signal Processing, Floating-Point Unit) Extensible Processors Platform • Enables to address three architectural levels on the base processor: • Inclusion/Exclusion of predefined blocks • Instructions extension • Parameterizations ncheung@cse.unsw.edu.au

  10. Outline • System-On-Chips Design Challenges • Extensible Processors Platform • Background – Xtensa • Problems in Extensible Processors Platform • The Goal of the research • Proposed Solution • INSIDE • MINCE • Conclusions ncheung@cse.unsw.edu.au

  11. Background (Xtensa) Source: www.tensilica.com ncheung@cse.unsw.edu.au

  12. Xtensa Architecture Source: www.tensilica.com ncheung@cse.unsw.edu.au

  13. Xtensa Custom Instructions (TIE) Source: www.tensilica.com ncheung@cse.unsw.edu.au

  14. Xtensa’s Design Flow Source: www.tensilica.com ncheung@cse.unsw.edu.au

  15. Extensible Processor (Why?) ncheung@cse.unsw.edu.au

  16. Example: Digital Video Source: www.tensilica.com ncheung@cse.unsw.edu.au

  17. Example: Multiple Processors for TOE Source: www.tensilica.com ncheung@cse.unsw.edu.au

  18. Example: Voice Gateway Source: www.tensilica.com ncheung@cse.unsw.edu.au

  19. Outline • System-On-Chips Design Challenges • Extensible Processors Platform • Background – Xtensa • Problems in Extensible Processors Platform • The Goal of the research • Proposed Solution • INSIDE • MINCE • Conclusions ncheung@cse.unsw.edu.au

  20. Design Expertise Required!! Problems in Automation?? Source: www.tensilica.com ncheung@cse.unsw.edu.au

  21. Generic Design Flow of Extensible Processor Application in C/C++ Compilation Analysis and Profiling Synthesizable RTL of: Base processor Predefined blocks Extensible instructions Instruction Library Identify: 1) Predefined blocks 2) Extensible instructions 3) Parameter settings Generate extensible processor Explore extensible processor design space Define/Select: 1) Predefined blocks 2) Extensible instructions 3) Parameter settings Proto-typing Synthesis and tape out Evaluate: Compiler, Linker, Assembler, Debugger, Instruction Set Simulator No Yes OK? ncheung@cse.unsw.edu.au

  22. Previous Works • Profiling/Identification • [Binh 1998], [Yang 2002], ARC, Xtensa • Design methodology for different aspects • [Gupta 2000], [Jain 2001] • Instruction generation/selection • [Brisk 1998], [Kastner 2001], [Sun 2003], [Zhao 2002] • Overall design flow for extensible processors • [Kathail 2002], [Lee 2002], [Sun 2002] • Vendor and academic • ARC, Lisatek, Xtensa, ASIP-Meister ncheung@cse.unsw.edu.au

  23. Outline • System-On-Chips Design Challenges • Extensible Processors Platform • Background – Xtensa • Problems in Extensible Processors Platform • The Goal of the research • Proposed Solution • INSIDE • MINCE • Conclusions ncheung@cse.unsw.edu.au

  24. Goal of the Research • To automate the design flow of extensible processors platform. • Given an application and design constraints, the system configures an extensible processor that maximizes the performance of an application while satisfying the design constraints. ncheung@cse.unsw.edu.au

  25. Application in C/C++ Application in C/C++ Compilation Compilation Analysis and Profiling Analysis and Profiling Synthesizable RTL of: Base processor Predefined blocks Extensible instructions Synthesizable RTL of: Base processor Predefined blocks Extensible instructions Instruction Library Instruction Library Identify: 1) Predefined blocks 2) Extensible instructions 3) Parameter settings Identify: 1) Predefined blocks 2) Extensible instructions 3) Parameter settings Fitting Function: - Finds suitable code segment to implement as extensible instruction INSIDE system Use libraries of pre-configured processor and instructions to limit the design space Generate extensible processor Generate extensible processor Define: 1) P.B. 2) E.I. 3) P.S. Select: 1) P.B. 2) E.I. 3) P.S. Explore extensible processor design space Explore extensible processor design space • MINCE tool • Match code • segment to • instruction Define/Select: 1) Predefined blocks 2) Extensible instructions 3) Parameter settings Proto-typing Proto-typing Synthesis and tape out Synthesis and tape out Evaluate: Compiler, Linker, Assembler, Debugger, Instruction Set Simulator Evaluate: Compiler, Linker, Assembler, Debugger, Instruction Set Simulator Performance Estimation Model 1) Information from profiling 2) Synthesized information of instruction No No Yes Yes OK? OK? Proposed Solution ncheung@cse.unsw.edu.au

  26. INSIDE / MINCE • INSIDE system • Identifies sections of code segments which are suitable for translation to instructions. • A two-level hierarchy approach. • A performance estimator. Reduces the design turnaround time significantly. • MINCE tool • Match code segment with pre-synthesized extensible instruction using combinational equivalence. Enhances reusability of the extensible instructions. ncheung@cse.unsw.edu.au

  27. Outline • System-On-Chips Design Challenges • Extensible Processors Platform • Background – Xtensa • Problems in Extensible Processors Platform • The Goal of the research • Proposed Solution • INSIDE • MINCE • Conclusions ncheung@cse.unsw.edu.au

  28. Processor 1 Processor 2 Processor 3 Base Processor Base Processor FPU Base Processor DSP INSIDE system (Overview) ncheung@cse.unsw.edu.au

  29. Phase I: Heuristic Algorithm (Part I) • For selecting pre-configured processor • The area delay product, EPiof processor i for a certain application ncheung@cse.unsw.edu.au

  30. Phase II: Identify Code Segments • Problem: • Application has millions of lines of C code, how do we know which code segment is good for converting to extensible instruction • Fitting function • Identifies code segments which are suitable for translation to extensible instructions • Extracts characteristics of the code segment ncheung@cse.unsw.edu.au

  31. Fitting Function • The four characteristics: • The frequency of use of a code segment • The number of operands in a code segment • The percentage of integer (short) type operands in all the operands • The percentage of bit operations in all the operands • The fitting function: α – the ideal number of operands in the code segment. ncheung@cse.unsw.edu.au

  32. Relationship • Relationship between the fitting function and the speedup/area ratio of the instruction ncheung@cse.unsw.edu.au

  33. Phase III: Heuristic Algorithm (Part II) • For selecting extensible instructions • The potential speedup/area ratio, PSAR, of extensible instruction in processor: ncheung@cse.unsw.edu.au

  34. Phase IV: Performance Estimation • For rapidly estimating the execution time • The execution time estimation, ETE, for an extensible processor with a set of selected extensible instruction: ncheung@cse.unsw.edu.au

  35. Experimental Methodology ncheung@cse.unsw.edu.au

  36. Results ncheung@cse.unsw.edu.au

  37. Results (Design Turnaround Time) ncheung@cse.unsw.edu.au

  38. Results (Pareto Points) ncheung@cse.unsw.edu.au

  39. Results of INSIDE system • Mediabench applications • Speedup of the applications: • On average 2.03x (up to 7x) • Hardware overheads: • On average 25% (up to 72%) • Pareto points: • Obtained on average 83% (up to 100%) • On average within 5% of the execution time ncheung@cse.unsw.edu.au

  40. Outline • System-On-Chips Design Challenges • Extensible Processors Platform • Background – Xtensa • Problems in Extensible Processors Platform • The Goal of the research • Proposed Solution • INSIDE • MINCE • Conclusions ncheung@cse.unsw.edu.au

  41. Functional Equivalent The Aim of the MINCE • Matching INstructions using Combinational Equivalence • Automatically matching code segments to pre-synthesized specific instructions. // Application in C int main() { … … // Code segment for (x = 0; x < 100000; x++) { tmp = a[x] * b[x] << 4; total += tmp; } … … } // Pre-synthesized specific instructions state total 32 iclass ei EI {out arr, in art, in ars} {in state} reference EI { wire [31;0] tmp assign tmp = TIEmul(art, ars, 1’b0) >> 4; assign arr = tmp + state; } ncheung@cse.unsw.edu.au

  42. MINCE tool • Matching designed extensible instruction to the code segment Motivation Application in C/C++ Design Constraints (Performance, Area, Power Consumption) Compilation Analysis and Profiling Identifying the “hotspots” of the application through simulation, profiling, and trace Designed Extensible Instruction Library Synthesizable RTL of base processor and extensible instructions Designing extensible instructions for the “hotspots” Explore extensible processor design space Testing/Verifying the functionality, speedup, area, power consumption of extensible instruction Generate extensible processor Selecting the extensible instructions based on the design constraints Proto-typing Synthesis and tape out No Yes OK? ncheung@cse.unsw.edu.au

  43. MINCE Tool • An automated tool for matching pre-synthesized extensible instruction to the functional equivalence of code segments using combinational equivalence checking in the extensible processors platform. ncheung@cse.unsw.edu.au

  44. II MINCE Tool • MINCE consists: • A translator • A filtering algorithm • A combinational equivalence checking tool ncheung@cse.unsw.edu.au

  45. Related Works • Simulation techniques: [Stadler 1999] • Graph matching techniques: [Corazao 1993] [Kang 1995] [Liem 1994] [Shu 1996] • Equivalence verifications: [Clarke 2003], [Pnueli 1998], [Semeria 2002] ncheung@cse.unsw.edu.au

  46. Our Contributions • Enhances reusability of the extensible instructions. • MINCE tool is superior to computation-intensive and error-prone simulation approaches. • The usage of functional equivalence checking ensures that the results are largely independentof the programming style of the application. ncheung@cse.unsw.edu.au

  47. Phase I: Translator • The goal of the translator is to convert the application written in C/C++ to a set of code segment in Verilog HDL using a systematic approach. • To reduce the granularity of the application written in C/C++. • The translator consists of four steps: • Separate the application into code segments; • Compile code segments; • Convert to register transfer list; • Map to Verilog HDL file. ncheung@cse.unsw.edu.au

  48. Translator ncheung@cse.unsw.edu.au

  49. Phase II: Filtering Algorithm • The goal of the filtering algorithm is to eliminate the unnecessary and complex Verilog HDL file into the combinational equivalence checking model. • Verilog HDL files can be pruned as non-match: • Differing number of ports; • Differing port sizes; • Insufficient number of base hardware modules to present complex module. ncheung@cse.unsw.edu.au

  50. Combinational Equivalence Checking • Binary Decision Diagram (BDD) • For example, • Using Verification Interfacing with Synthesis (VIS) which was jointly developed by the University of California, in Berkeley and the University of Colorado, Boulder. ncheung@cse.unsw.edu.au

More Related