1 / 49

Optimising Transformations for Hardware Compilation

Optimising Transformations for Hardware Compilation. Contributions Transformation language for restructuring and optimisation of Handel-C supporting data-integrity conditions. Prototype transformation engine for the language.

jill
Download Presentation

Optimising Transformations for Hardware Compilation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimising Transformations forHardware Compilation • Contributions • Transformation language for restructuring and optimisation of Handel-C supporting data-integrity conditions. • Prototype transformation engine for the language. • Automatic transformations giving a 35-70% reduction in execution time. • An insight into the interaction of transformations: variability between platforms, difficulty of prediction. Ashley Brown, Department of Computing, Imperial College London Final Project Presentation, 21st June 2005

  2. Introduction

  3. What would we like to do? • Take an algorithm in written in C. • Generate an efficient hardware design, run it on an FPGA. • Fast design cycle, easy to maintain code. • C programmers should be able to create fast hardware! Ashley Brown

  4. Background: Handel-C • C-based programming language for digital system design. • One clock-cycle per statement. • Explicit parallelism. • Compiler generates hardware design from Handel-C source. while (j != 3) { par { t0 = aa[0] * bb[0]; t1 = aa[1] * bb[1]; } par { cc[i][j] = t0 + t1; j++; } } Handel-C code example. Ashley Brown

  5. Problems • Software programmers: Bad Handel-C, poor hardware. • No exploitation of statement-level parallelism. • Long expressions. • Lots of for loops! • Experienced Handel-C designers: good hardware, hard to read code. • Trickery to reduce clock cycles, increase clock rate. • Finding the “optimal” solution is not easy. • Optimisation effectiveness depends on the target architecture (see the results later!) Ashley Brown

  6. Solutions • Restructure Handel-C code to optimise. • Can parallelise if desired. • Duplicate hardware if necessary. • Apply transformations to the original source, leaving it intact. • The original readable description is still available. • A more efficient version is used for hardware generation. • Allow the user to define custom transformations with a transformation language. • Generate a whole design-space of solutions, with different optimisations. Ashley Brown

  7. Current Solutions • ROSE, Stratego, CTT. • CTT has straightforward syntax. • Others are more complicated, not intuitive. • Stratego support strategies. • Strategies in the hardware world difficult to decide. • Need a different strategy for each architecture. • Haydn-C: restructuring of code similar to Handel-C • But not user-specified transformations. Ashley Brown

  8. What’s New? • Previous work with user-specified transformations has been: • For software-based C. • Aimed at parallelising/optimising for microprocessors • Can’t duplicate microprocessor hardware on the fly – it’s either there or not.We can duplicate hardware, pipeline – FASTER DESIGN! • Previous work on hardware language transformations do not allow the user to describe transformations (Haydn-C).We do – the user can target their code explicitly. • Exploring an entire design-space is usually done at the hardware level, not high-level language (although not always, e.g. ASC).We generate a full design-space – find *the* best solution. Ashley Brown

  9. The Transformation Language

  10. Cobble-CML • Cobble: compiler framework for Handel-C. • CML: partially defined proposal for a transformation language for Cobble, builds on CTT. • Cobble-CML: Our solution. custom_transform {pattern { 0 * expr(0) }generate { 0 }} 0-constant elimination defined in original CML. Ashley Brown

  11. Why choose CML? • Familiar syntax to Handel-C users. • Only partially defined, but showed potential. • Problems: • No data flow conditions – can’t check that transformations won’t destroy data integrity. • Transformations don’t have names. Ashley Brown

  12. Changes to CML • New conditions field, data integrity conditions • automatic parallelisation not safe without it. • Naming of transformations. • Wildcard matches named rather than numbered. • Conditions allow more powerful transformations. transform zero_elim {pattern {cmlexpr(l)*cmlexpr(r) }generate { 0 } conditions { eval(cmlexpr(l) == 0 || cmlexpr(r) == 0); } } 0-constant elimination defined in CML. Ashley Brown

  13. CML transformations are Each transformation can transform defined within have a name to identify it for blocks . reporting . always The optional keyword indicates that this transformation should always be applied where it can . ) pattern The section describes the format of the code to match for this transformation . cmlexpr Wildcards , such as , generate The section allow a pattern to be matched describes the code which and substituted into the new pattern should replace the . tree . Basic Components • Wildcard matching: • cmlexpr - matches any expression • cmlstmt - matches any statement • cmlstmtlist - matches a list of statements // 1 * x = x always transform std _ times 1 _ elim { pattern { cmlexpr 1 * ( operand ) } generate { cmlexpr ( operand } } Ashley Brown

  14. Ensuring Data Integrity • Three types of condition are defined to ensure data integrity: • Data-flow sets. • Expression evaluation. • Constant validation. • Transformations have a conditions section to define these. Ashley Brown

  15. Data Dependencies • Can’t modify source trees at will (we could … but we shouldn’t). • Ideal: full data-dependency analysis. • We can get away with less. • Solution: Data-flow set manipulation. Ashley Brown

  16. Data Dependencies Ashley Brown

  17. Data Dependencies Ashley Brown

  18. transform auto_par { pattern { cmlstmtlist(preamble); cmlstmt(par1); cmlstmt(par2); cmlstmtlist(postamble); } generate { cmlstmtlist(preamble); par { cmlstmt(par1); cmlstmt(par2); } cmlstmtlist(postamble); } conditions { // don't assign to the same place defs(cmlstmt(par1);) & defs(cmlstmt(par2);) == {}; // second statement not waiting on first defs(cmlstmt(par1);) & uses(cmlstmt(par2);) == {}; } } q = a << 1; qp = q + 1; qm = q - 1; Code to Match Worked Matching Example Ashley Brown

  19. Match Option #1 transform auto_par { pattern { cmlstmtlist(preamble); cmlstmt(par1); cmlstmt(par2); cmlstmtlist(postamble); } } q = a << 1; qp = q + 1; qm = q - 1; Code to Match Ashley Brown

  20. { q } { qm } qp = q + 1; q = a << 1; { q } { q } qp = q + 1; q = a << 1; par { q = a << 1; qp = q + 1;} qm = q - 1; conditions { defs(cmlstmt(par1);) & defs(cmlstmt(par2);) == {}; defs(cmlstmt(par1);) & uses(cmlstmt(par2);) == {}; } Disaster if we did not check! Match Option #1   Ashley Brown

  21. Match Option #2 transform auto_par { pattern { cmlstmtlist(preamble); cmlstmt(par1); cmlstmt(par2); cmlstmtlist(postamble); } } q = a << 1; qp = q + 1; qm = q - 1; Code to Match Ashley Brown

  22. { qp } { qm } qm = q - 1; qp = q + 1; { qp } { q } qm = q - 1; qp = q + 1; conditions { defs(cmlstmt(par1);) & defs(cmlstmt(par2);) == {}; defs(cmlstmt(par1);) & uses(cmlstmt(par2);) == {}; } Match Option #2   Ashley Brown

  23. The Transformation Engine

  24. Integrating with Cobble Ashley Brown

  25. Tree Matching Code Transformation pattern { 0+cmlexpr(a) } generate { cmlexpr(a) } b =5*(0+1) Ashley Brown

  26. Tree Matching Ashley Brown

  27. Just Handel-C? • No need to limit to Handel-C. • Tree-matching algorithm will work with any compatible ASTs. • Any language we can turn into a Handel-C AST can be used. • Automatic parallelisation: source language need not support it explicitly. Ashley Brown

  28. Factors in Hardware Design Speed Power Area Ashley Brown

  29. Design-Space Exploration • Difficult to decide which transformation is best. • Don’t guess, produce several solutions. • Branch the AST whenever a transformation is applied. • In-place branches: small AST. • Propagate branches when no more transformations can be applied. • Repeat transformation process on each new solution. Ashley Brown

  30. Design-Space Exploration Transform, creating a branch point. Ashley Brown

  31. Design-Space Exploration Propagate branches to root – create several distinct solutions. Ashley Brown

  32. Test Transformations • Generic – applicable to all programs: • autopar – parallelise sequential statements with no dependencies. • fortowhile – convert for loops into corresponding while loops. • lttoeq – convert for loops with < in the loop condition to ==. • Application specific – targetted at the test programs: • matrixpar – parallelisation of an inner loop. Ashley Brown

  33. More Transformations • Various mathematical rearrangments: • Factorise to reduce multiplies. • Remove *1, *0, +0 etc. • More interesting: • Dead-code elimination (remember data conditions!) • Variable replacement • remove dependencies in code by replacing variables with the expressions assigned to them last (again, remember data conditions!) Ashley Brown

  34. Results

  35. Live Demo • We take two blocks of sequential division code, one parallelised, one not. • This should be a live demo, unless something breaks! Ashley Brown

  36. Hand-coded Parallel Hand-coded by Matt Aubury, VP Engineering of Celoxica Ltd and former project student of Wayne Luk. Ashley Brown

  37. Pure Sequential Same code, modified for Cobble but with no parallelism. Ashley Brown

  38. Tool-Generated This should look familiar! Ashley Brown

  39. lttoeq increases fmax on Altera, but decreases it on Xilinx Execution Time Improvement Execution Time (s) Optimisation Applied (Optimisations are Cumulative) Ashley Brown

  40. Platform Variance Ashley Brown

  41. Platform Variance Ashley Brown

  42. Cycle Count Improvements Ashley Brown

  43. Design Space Exploration Ashley Brown

  44. Design Space Exploration • Assume design with an fmax of 104MHz, must match that. • Many solutions matching. • we should consider other factors such as area, power or number of cycles. • Being brief: look at solutions 139 and 232. • Only partially parallelised. Solution with most parallelism (239) does not meet the fmax requirement. Ashley Brown

  45. Future Work • Extensions to the language to allow additional matching. • expr replicator, complex expression matching. • Preservation of structure – e.g. a++; does not become a = a + 1; • Heuristics for selecting transformations to apply. • Genetic algorithms for transformation selection? “Breed” good transformation solutions. Ashley Brown

  46. Future Applications • Aspect-oriented concepts: automatically inserting debugging signals. • Power-signature-masking code to avoid attacks in cryptographic applications. Ashley Brown

  47. Conclusion • Matching method can achieve good results on naïve C code. • Targeting domain- or application-specific constructs can provide large performance gains at the expense of resources. • Scope to produce a much more powerful system with changes to the transformation language, heuristics and more efficient algorithms. Ashley Brown

  48. Contributions • The first transformation language for parallelising hardware languages with data integrity conditions. • A prototype transformation engine for implementing the language. • Automatic transformations capable of achieving a 35-70% reduction in execution time. • An insight into the interaction of transformations, both with each other and with the platform their output runs on. Ashley Brown

  49. Questions This presentation, the final report, outsourcing report and source code are available from:https://www.doc.ic.ac.uk/~awb01/project/

More Related