360 likes | 541 Views
Bastian Knerr June 6th, 2008. HW/SW Co-design System Partitioning in HW/SW Co-Design. Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms. Outline. HW/SW Codesign for Embedded Systems System Partitioning Heterogeneous Platforms
E N D
Bastian Knerr June 6th, 2008 HW/SW Co-designSystem Partitioning in HW/SW Co-Design Christian Doppler Laboratory for Design Methodology of Signal Processing Algorithms
Outline • HW/SW Codesign for Embedded Systems • System Partitioning • Heterogeneous Platforms • Mapping Graphs to Platforms • Heuristic Optimisation Methods for Multiple Objectives • Summary
Embedded System Design An embedded system is a computing device in general subject to a specific purpose and its implementation is predominantly deter-mined by this purpose, usually entailing a complete encapsulation into the environment where this purpose is located at. Automotive Phones/PDAs Transceiver (WIFI, WLAN, xDSL,...)
Outline • Embedded System Design • System Partitioning • Heterogeneous Platforms • Mapping Graphs to Platforms • Heuristic Optimisation for Multiple Objectives • Summary
Heterogeneous Platforms Classical HW/SW Codesign Platform Is around for ~20 years Served well to get a first grip on partitioning Has not gained any relevance for industrial design flows
Heterogeneous Platforms • Modern rapid prototyping platforms • Prototyping board forreal-time MIMO OFDM • DSP+Microcontroller • FPGAs • Busses and Bridges • RAM and Registers • Interfaces
Heterogeneous Platforms • Modern SoC/embedded platforms • UMTS baseband trans-ceiver chip (2003) • DSP+Microcontroller • ASICs • Busses and Bridges • RAM and Registers • Interfaces
Heterogeneous Platforms • Library for • DSPs • Cache/RAM • Schedules • FPGA • RAM/Flash • Slices/Gates • ASICs • Registers/Gates • Channels • Fifo/Direct/Bus • Memory • Schedules • Parallel read/write access
Outline • Embedded System Design • System Partitioning • Heterogeneous Platforms • Mapping Graphs to Platforms • Heuristic Optimisation for Multiple Objectives • Summary
Mapping Graphs to Platforms System Graphs
Mapping Graphs to Platforms NP-hard multi-objective optimisation problem Proven to be NP-complete by restriction to the classical graph partitioning problem
Outline • Embedded System Design • System Partitioning • Heterogeneous Platforms • Mapping Graphs to Platforms • Heuristic Optimisation for Multiple Objectives • Summary
Heuristic Optimisation • Multi-objective optimisation problem • A mapping of a problem instance Iis called valid, iff , with being objective functions and being constraints. • : is the mapping relation of a vertex i to the jth implementation alternative Aon resource r. • Objective functions: • Area for ‚HW‘ in gates/slices/NAND2 equivalents ( ) : , with for ASICs, for FPGAs • Code size for ‚SW‘ in bytes ( ) : , with for code size on DSPs. • ...
Heuristic optimisation Objective function fT: system delay (makespan) Multi-core scheduling is NP-hard as well
Heuristic Optimisation Definition A heuristic is a robust technique for the design of (randomised) algorithms for optimisation problems, and it provides (randomised) algorithms for which one is not able to guarantee at once the efficiency and the quality of the computed feasible solutions, even not with any bounded constant probability P > 0.
Heuristic Optimisation • Partitioning analytically not solvable • Use heuristic methods • Simulated Annealing • Tabu Search • Kernighan-Lin min-cut • Genetic Algorithm • Particle Swarm • Custom Heuristics (GCLP, RRES, etc.) • ...
Heuristic Optimisation Classical Kernighan-Lin min-cut • Modifications • More than two partitions • Unbalanced partitions allowed • Multiple objectives • Omit change list • ...
Summary • Scheduling/Partitioning is a hard optimisation problem • Heuristic methods have to be applied • Highly dependent on platform model and high level estimation techniques • Many questions yet unsolved • Execution time profiles for processes (control flow) • Estimation uncertainties • Automated platform composition • ...
Outline Thank you for your attention
Typcial Graphs Industry Design for xDSL Transceiver
Graph Properties • Degree of parallelism γ = |VCP| / |V| • Density ρ = |E| / |V| • Rank-Locality rloc = 1 / |E| Σ (rank(vhead) –rank(vtail)) rank
Restricted Range Exhaustive Search • Create task graph • Create ordered vector of processes • Create initial mapping • Start exhaustive search on subset of processes (window) • Move window along the vector • Finally map process that leaves the window • Strong performance for typical graphs • Degree of parallelism • Density • Locality
Results Averaged Cost Averaged Validity Normalised Relative Cost κ= f (parallelism, locality) Window Length
The Genome Coding • Arrange vertices on a string • String elements (alleles) indicate implementation alternative • What about the order of the vertices? Does it matter?
Recombination with chromosomes • 1-point crossover • Multi-point crossover • Uniform crossover • Why does it work? • Fundamental schema theorem and the building block hypothesis • Schema theorem • Short, low-order, above averageschemata (building block) proliferate • Below-average schemata die off • What makes schemata fit in system partitioning?
Combinatorial vs. structural fitness • Combinatorial (area, code size, time) • Low resource consumption is ensured for any single vertex • Combination of assignments utilise resources optimally • Structural (time) • Exact graph matching bet-ween task and architecturesubgraphs • Parallel execution of proces-ses and data transfers • Structural fitness requires a representation in the chromo-some • Building blocks are short, low-order, and fit schemata
Coding for structural exploitation • Locality preserving chromosome coding • Adjacent vertices in task graph shall be adjacent in chromosome • Use two schedules • As soon as possible • As last as possible • Arrange vertices viin increasing average start times: stavg(vi) = stasap(vi) + stalap(vi)
Results • Impact of genome coding new rank random Cost
More results • Structural mutation • 1-gene mutation (M1g) • Swap mutation (Msw) • Multi-swap mutation (Mbb)
More results • Comparison with other heuristics • Penalty reward tabu search (pwTS) • Simulated annealing (SA) • Global criticality/local phase (GCLP) Averaged cost Ω Averaged Validity Ψ
Conclusion • 3-operator GA has been implemented and analysed • Structural problem components (time) have been exposed • Genome coding Locality preserving ordering • Mutation Multi-swap mutation • Crossover depends heavily on building block size • Comparison with heuristics from literature showed superior performance of GA over pwTS • In contrast to published work
Results • Related to crossover recombination • Uniform • 10-point • 5-point • 1-point new random
More results • Selection over mutation probability • Binary tournament (BT) • Survival of the fittest (SOTF) • Roulette wheel (RW)