160 likes | 341 Views
Politecnico di Milano. HIGH LEVEL SYNTHESIS WITH AREA CONSTRAINTS FOR FPGA DESIGNS: AN EVOLUTIONARY APPROACH. Tesi di Laurea di: Christian Pilato Matr.n. 674373 Relatore: Prof. Fabrizio FERRANDI Correlatore: Ing. Antonino TUMEO. Outlines. 2. Summary. High-Level Synthesis
E N D
Politecnico di Milano HIGH LEVEL SYNTHESIS WITH AREACONSTRAINTS FOR FPGA DESIGNS:AN EVOLUTIONARY APPROACH Tesi di Laurea di: Christian PilatoMatr.n. 674373 Relatore: Prof. Fabrizio FERRANDICorrelatore: Ing. Antonino TUMEO
Outlines 2 Summary • High-Level Synthesis • Proposed methodology • Experimental results • Some further extensions… • Conclusion and future works
High-Level Synthesis 3 High-Level Synthesis – Problem description • Three main sub-tasks: • operation scheduling: when operations start their execution • resource allocation and binding: where operations are executed, where values are stored and how elements are interconnected. • controller synthesis: which operations are issued • Inputs: • behavioral description (in C language) • library of different types of resources • set of constraints “High-Level Synthesis means going from an algorithmic level specification of a behaviour of a digital system to a register level structure that implements that behavior”. McFarland, et al., Proc. IEEE, February 1990. Output: register-transfer level (RTL) design in a hardware description language (e.g. SystemC, VHDL and Verilog) Goal: minimize objectives (area, latency, etc.) Resource Library Behavioral specification Design constraints High-Level Synthesis tool Objectives Scheduling Datapath& Controller Allocation Binding Controller Synthesis
What are the problems? 4 High-Level Synthesis – Problem description • All the sub-tasks are NP-complete: no efficient algorithms • Interconnections have to be considered: up to 80% of final area • All the tasks are closely interdependent • Most of information are available only at the end of the synthesis Try non-deterministic approaches with feedback information Genetic algorithms Multi-objective optimization: reducing to single-objective (weighted average) is not efficient Non-dominated Sorting Genetic Algorithm (NSGA-II) K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan, “A Fast and Elitist Multi-Objective Genetic Algorithm: NSGA-II,” Proceedings of the Parallel Problem Solving from Nature VI Conference, pp. 849–858, 2000.
The proposed methodology 5 High-Level Synthesis and Design Space Exploration
Experimental results 6 Experimental results • Development framework • Integrated in the PandA framework • an open-source C++ framework covering different aspects of the hardware-software design of embedded systems • Evolutionary computation with Open BEAGLE framework • Functional validation • Comparison between Verilog and C simulations Estimation model validation • Comparison between estimations and logic synthesis values • average error equal 4.02 %standard deviance equal 2.82 %maximum error less than 10 % These values can be effectively used as fitness values
Experimental results 7 Experimental results • Design Space Exploration validation • Population size of 1.000 individuals, evolving up to a maximum of 200 generations • the best trade-off between overall execution time and solution quality. Considerations: • It takes into account all elements in the design solution • It can cover a good number of trade-offs between the fastest solution and the minimal area solution • Better approach than existing tools to deal with area constraints Paper accepted for publication at International Symposium on Systems, Architectures, MOdeling and Simulation (SAMOS), Samos, Greece, July 2007Title: “An Evolutionary Approach to Area-Time Optimization of FPGA designs”
Some features just provided… 8 Some extensions… • Weighted clique covering: in register allocation to reduce interconnections • An higher weight is assigned to compatibility edge when the two values involve the same functional units • Clique covering on a weighted graphs; results show a further reduction of overall area up to 10%. • Fitness inheritance: to reduce overall execution time • A fraction of expensive real evaluations is substituted with an estimation based on similar evaluated individuals • It is able to reduce overall execution time over by 25% • No substantial difference in the final Pareto-optimal solution Paper submitted to IEEE Congress of Evolutionary Computation (CEC) 2007, Singapore, September 2007. Title: “Fitness Inheritance In Evolutionary and Multi-Objective High Level Synthesis”
Conclusion and future works 9 Conclusion and future works • The main contributions from this thesis are: • An high-level synthesis flow from C specifications to HDL descriptions • Integration of a model for fast estimation of synthesis results • Design space exploration with a genetic algorithm: • It takes into account all elements composing the design solution • High fitting with real values • Multi-objective concurrent optimization Future works: • Optimize the results coming from the synthesis flow • Further reduce the overall execution time of the proposed methodology • Refine the estimation model and specialize it for different targets
Thank you! Christian PILATOMatr. n. 674373
High-Level Synthesis Flow 5 The proposed High-Level Synthesis flow The proposed flow is organized as follows: • From C to intermediate representation • from GIMPLE to produce graph representation • High-Level Synthesis Flow • Partial binding and Scheduling • Finite State Machine creation • Register allocation • Interconnection allocation • Performance and area estimations • From data structures to intermediate representation in form of graph • From intermediate representation to Hardware Description Language (e.g. Verilog) ready for logic synthesis
Partial Binding and Scheduling 6 1. Partial binding and Scheduling Partial binding: force an operation to be executed on a selected functional unit instance β (+1) = < plus; 0 > • A technique introduced to partially control the final area occupation • It can affect scheduling, register allocation and interconnection allocation Scheduling: assign a starting control step to each operation to be executed • Many scheduling algorithms are able to support partial binding (Integer Linear Programming formulation, list based algorithm, etc.) • Different solutions based on the selected algorithm
FSM and Register allocation 7 2-3. Finite State Machine creation and Register allocation • Scheduling gives information about concurrent operations. • This information is useful for controller synthesis and register allocation • State Transition Graph (STG), based on Moore-FSM model, is created on scheduled specification • It represents control flow and concorrent operations • Conditional operations create bifurcation based on evaluated conditional values • Register allocation: allocate elements to store values across cycle step boundaries. A compiler approach has been implemented on STG: • Liveness analysis based on dataflow equations • Interference graph based on liveness information • Different heuristics to minimize number of registers
The final steps… 8 4-5. Interconnection allocation and result estimations • Interconnection allocation: allocate elements to interconnect the hardware components • Mux-based architecture: port swapping for commutative operations • Glue logic: represent logic netlist to decode commands and select inputs • Truth tables based on signals from controller • The RTL structural description is now available and it considers all elements. Objective values could be retrieved from logic synthesis • too slow! • Estimation model: perform fast estimations of objective values. • Area is difficult to be estimated • Updated and used an existing area model* *: C. Brandolese, W. Fornaciari, and F. Salice. “An Area Estimation Methodology for FPGA Based Designs at SystemC-Level ”, DAC '04: Proceedings of the 41st annual conference on Design automation, pp. 129– 132, 2004.
Problem dependent elements 10 Design Space Exploration by Genetic Algorithm • Chromosome encoding • Each operation in the specification has a gene to represent a feasible partial binding • Genes are added to represent algorithms used to perform high-level synthesis steps: scheduling, register allocation and interconnection optimization • Fitness Evaluation • Information from chromosome about partial binding and algorithms are used to perform a synthesis flow. • Objective values are estimated using the proposed model
Problem independent elements 11 Design Space Exploration by Genetic Algorithm • Generic operators • common operators (crossover and mutation) used without modifications: no unfeasible chromosomes can be created. • If the gene changed by operators is related to: • operation: a new binding constraint for that operation. • algorithm: a different algorithm to solve the related synthesis step • Initial population • created by random or starting from some interesting points to explore around them. • Solution ranking • ranking into different levels according to their fitness values. • accelerated using the fast-non-dominated-sort algorithm available in the NSGA-II