NASA/ESA Conference on Adaptive Hardware and Systems (AHS-2013) Torino, Italy – June 25-27, 2013 Ant Colony Optimization for Mapping, Scheduling and Placing in Reconfigurable Systems FabrizioFerrandi, PierLucaLanzi, Christian Pilato, Donatella Sciuto Politecnico di Milano – Dip. di Elettronica, Informazione e Bioingegneria AntoninoTumeo Pacific Northwest National Laboratory – Richland, WA, U.S.A
Outline • Motivation • Related Work • Preliminaries and Motivation • Proposed Exploration Methodology • Experimental Results • Conclusions and Future Work
Heterogeneous Systems • Mapping and scheduling of partitioned applications are crucial in particular for heterogeneous MPSoCs • Different design constraints and overheads have to be necessarily considered to provide feasible and efficient solutions (e.g., limited area for hardware devices, interconnection topology, …) • Constructive methods are definitively required • Ant Colony Optimization is a promising constructive method toto produce very efficient solutions for the combined problem • Considering FPGAs, possibility of introducing dynamic reconfiguration introduces several challenges to be taken into account
Reconfigurable Systems • Partial Dynamic Reconfiguration allows changing portion of FPGA configuration at run time • reuse of the device area to accelerate even more sections of an application • Additional constraints and overheads are introduced • reconfiguration latencies, number or reconfiguration ports and processing elements to drive the reconfiguration. • Accurate placement of the hardware components is critical Concurrent exploration of the design space for mapping, scheduling and placing of the tasks
Related Work • [Niemann and Marwedel 1997] Exact solutions for the combined problem with an ILP formulation on DAGs. • Different heuristic methods have been proposed to approach the problem • [Pilato et al. 2010] Ant Colony Optimization (ACO) has been demonstrated to produce good solutions, limiting the number of unfeasible ones • [Banerjee et al. 2006] Optimization method based on Kernighan-Lin-Fiduccia-Matthesys (KLFM) adopts heuristics for the scheduling and the placing of the tasks • Reduced exploration in the design space
Target Architectural Template • Generic architectural template composed of processing and communication elements. A valid test case is the following one: • Number of pre-defined blocks where the tasks can be placed • Granularity and occupation for each task have to be defined in advance SharedMemory Shared bus LocalMemory ARM LocalMemory PowerPC C0 C1 Cn … LocalMemory DSP
Preliminaries: ProblemDefinition • Job: genericactivity (task or communication) tobecompleted in ordertoexecute the specification • Implementation point: the mode for the execution of a job. Itrepresents a combinationoflatency and requirementsofresources on the relatedtarget component • Mapping: assigneach job toanadmissibleimplementationpoint, respecting the architecturalconstraints (e.g., the limitedresourcesof the components) • Scheduling: determine the order of execution of all the jobs of the specification in terms of priorities • Placing: determine the physical position of all the tasks that have to be executed in hardware • Objective: minimize the overall execution time of the application on the target architecture
ACO Principles • Ant Colony Optimization (ACO) heuristic is a constructive approach that limits as much as possible the generation of unfeasible solutions • Constructive approach, based on a decision tree, to generate parts of the solution based on the decisions taken in the previous parts. • Analysis and evaluation of different combinations of mapping, scheduling and placing • Decision is based on a combination of local and global information, through a roulette wheel mechanism • Stochastic principles guarantee the exploration • Heuristic principles and feed-backs guarantee the exploitation of good parts of the solutions
Design SpaceExplorationwith ACO Initializepheromones ACO PrepareNants Colony Compute the set Cofcandidates Ant Perform a decision Update set Cof candidate Evaluate design solution Update pheromones
Stochastic Selection Process • At each decision point d, the probability to assign a candidate job jto a proper implementation point i is: • Global information G:feedback information • Probability that the decision leads to a good solution • Local heuristic L: problem-specific hint • “Adjusted” by the global heuristic if wrong • Roulette wheel and extraction of a combination i, j • The ant does not generate the probability if the decision leads to a constraint violation global heuristic local heuristic
Decision Methods for Combined Problem • Scheduling can include both reconfiguration and execution tasks • Executing tasks can be eligible only if the dependences are satisfied • Reconfiguration tasks are always available (implicit hardware assignment) • When a reconfiguration task is selected, it is generate a candidate choice also with respect to the position in the FPGA for its execution • The latency of reconfiguration tasks depends on where the task is assigned (i.e., if the reconfiguration takes effectively place or not) • Scheduling of the reconfiguration takes into account also the availability of the reconfiguration port (ICAP) and the processor driving the reconfiguration
SolutionEvaluationfor a Task Graph • The decisions performed by the ant give a trace • Sequenceofjobs, whereeachofthemisassignedtoanimplementationpoint • The position into the trace represents the priorityfor the scheduling (iftheyhavebeenselectedearly, theyhavehigherpriority…) • List-based scheduler based on the trace (i.e., the implementation points and the priority values) • Differentdecisionsperformedby the antcorrespond in exploringdifferent design solutions (combinationofmapping and scheduling) • Returnoverallexecutiontimeof the application • Feedback to compare different solutions (reinforcement/penality of the global heuristic for the corresponding decisions)
Experimental Setup • Target architecture composed of an ARM processor, a Digital Signal processor and an FPGA that also embeds a Power Pc processor • It allows to explore both hardware and software solutions • Synthetic benchmark to evaluate the scalability of the approach • We compared the ACO solutions with other search methods • [Pilato et al. 2010] ACO where PDR is not supported: tasks can be allocated to the FPGA as long as they fit into the available area • Advantages of the PDR technique • [Banerjee et al. 2008] KLFM with support for PDR • Advantages of the ACO method
Experimental Results • [9] corresponds to [Pilato et al. 2010]: ACO without support for PDR • Great advantages in introducing PDR • [10] corresponds to [Banerjee et al. 2006]: KLFM with support for PDR • ACO performs better in terms of quality of the solutions • Better exploration of the design space • Much more scalable • KLFM get stuck in approaching larger benchmarks
Conclusions and Future Work • Ant Colony Optimization is very attractive to generate solutions for designing heterogeneous MPSoCs • Handling of design constraints is very simple and efficient • Constructive approach that limits unfeasible solutions • Support for different architectural templates can be easily provided • Results show that it is able to outperform most of the existing search methods • More robust and scalable • Future work: • Closer integration with estimation methods and/or high-level synthesis for creating the implementations