670 likes | 784 Views
High Level Synthesis. CSE 237D: Spring 2008 Topic #6 Professor Ryan Kastner. ?. Ant System Optimization: Overview. Ants work corporately on the graph Each creates a feasible solution Ants leave pheromones on their traces Ant make decisions partially on amount of pheromones
E N D
High Level Synthesis CSE 237D: Spring 2008 Topic #6 Professor Ryan Kastner
? Ant System Optimization: Overview • Ants work corporately on the graph • Each creates a feasible solution • Ants leave pheromones on their traces • Ant make decisions partially on amount of pheromones • Global Optimizations • Evaporation: Pheromones dissipate over time • Reinforcement: Update pheromones from good solutions • Quickly converges to good solutions
Solving Design Problems using AS • Problem model • Define the solution space: create decision variables • Pheromone model • Global heuristic: Provides history of search space traversal • Ant search strategy • Local heuristic: Deterministic strategy for individual ant decision making • Solution construction • Probabilistically derive solution from local and global heuristics • Feedback • Evaluate solution quality, Reinforce good solutions (pheromones), Slightly evaporate all decisions (weakens poor solutions)
Max-Min Ant System (MMAS) Scheduling • Problem: Some pheromones can overpower others leading to local minimums (premature convergence) • Solution: Bound the strength of the pheromones • If , always a chance to make any decision • If , the decision is based solely on local heuristics, i.e. no past information is taken into account
MMAS RCS Formulation • Idea: Combine ACO and List Scheduling • Ants determine priority list • List scheduling framework evaluates the “goodness” of the list • Global heuristics permutation index • Local heuristic – can use different properties • Instruction mobility (IM) • Instruction depth (ID) • Latency weighted instruction depth (LWID) • Successor number (SN)
RCS: List Scheduling • A simple scheduling algorithm based on greedy strategies • List scheduling algorithm: • Construct a priority list based on some metrics (operation mobility, numbers of successors, etc) • While not all operations scheduled • For each available resource, select an operation in the ready list following the descending priority. • Assign these operations to the current clock cycle • Update the ready list • Clock cycle ++ • Qualities depend on benchmarks and particular metrics
Global heuristic: Pheromones : the favorableness of selecting operation i to position j Global pheromone matrix Local heuristic: Local metrics : Instruction mobility, number of successors, etc Local decision making: a probabilistic decision Evaporate pheromone and reinforce good solution MMAS RCS: Global and Local Heuristics
op1 1 op2 2 op3 3 op4 4 op5 5 op6 6 Instructions Priority List Pheromone Model For Instruction Scheduling Each instruction opi Iassociated with n pheromone trailswhere j = 1, …, n each indicates the favorableness of assign instruction i to position j Each instruction also has a dynamic local heuristic
1 2 3 4 5 6 Priority List Ant Search Strategy • Each run has multiple iterations • Each iteration, multiple ants independently create their own priority list • Fill one instruction at a time op1 op1 op4 op2 op2 op1 op3 op3 op5 op4 op4 op6 op5 op5 op2 op6 op6 op3 Instructions
Ant Search Strategy • Each ant has memory about instructions already selected • At step j ant has already selected j-1 instructions • jth instruction selected probabilistically op1 op1 op4 1 op2 op2 2 op1 op3 op3 op5 3 op4 op4 4 op5 op5 5 op6 op6 6 Instructions Priority List
Ant Search Strategy • ij(k) : global heuristic (pheromone) for selecting instruction i at j position • j(k) : local heuristic – can use different properties • Instruction mobility (IM) • Instruction depth (ID) • Latency weighted instruction depth (LWID) • Successor number (SN) • , control influence of global and local heuristics
Pheromone Update • Lists constructed are evaluated with List Scheduling • Latency Lh for the result from ant h • Evaporation – prevent stigmergy and punish “useless” trails • Reinforcement – award trails with better quality
1 2 3 4 5 6 Priority List Pheromone Update • Evaporation happens on all trails to avoid stigmergy • Reward the used trails based on the solution’s quality op1 op1 op4 op2 op2 op1 op3 op3 op5 op4 op4 op6 op5 op5 op2 op6 op6 op3 Instructions
Max-Min Ant System (MMAS) • Risks of Ant System optimization • Positive feedback • Dynamic range of pheromone trails can increase rapidly • Unused trails can be repetitively punished which reduce their likelihood even more • Premature convergence • MMAS is designed to address this problem • Built upon original AS • Idea is to limit the pheromone trails within an evolving bound so that more broader exploration is possible • Better balance the exploration and exploitation • Prevent premature convergence
Max-Min Ant System (MMAS) • Limit (t) within min(t) and max(t) • Sgbis the best global solution found so far at t-1 • f(.) is the quality evaluation function, i.e. latency in our case • avg is the average size of decision choices • Pbest (0,1]is the controlling parameter • Conditional prob. of Sgb being selected when all trails in Sgb have maxand othershavingmin • Smaller Pbest tighter range for more emphasis on exploration • When Pbest 0, we setmin max
Other Algorithmic Refinements • Dynamically evolving local heuristics • Example: dynamically adjust Instruction Mobility • Benefit: reduce search space progressively • Taking advantage of topological sorting of DFG when constructing priority list • Each step ants select from the ready instructions instead from all unscheduled instructions • Benefit: greatly reduce the search space
Benchmarks: ExpressDFG • A comprehensive benchmark for TCS/RCS • Classic samples and more modern cases • Comprehensive coverage • Problem sizes • Complexities • Applications • Downloadable from http://express.ece.ucsb.edu/benchmark/
RCS Experimental Results • Heterogeneous RCS – multiple types of resources (e.g. fast and normal multiplier) • ILP (optimal) using CPLEX • List scheduling • Instruction mobility (IM), instruction depth (ID), latency weighted instruction depth (LWID), successor number (SN) • Ant scheduling results using different local heuristics (Averaged over 5 runs, each run 100 iteration with 5 ants)
RCS Experimental Results • Homogenous RCS – all resources have unit delay • New benchmarks (compared to last slide) too large for ILP
MMAS RCS: Results • Consistently generates better results over all testing cases • Up to 23.8% better than list scheduler • Average 6.4%, and up to 15% better than force-directed scheduling • Quantitatively closer to known optimal solutions
MMAS TCS Formulation • Idea: Combine ACO and Force Directed Scheduling • Quick FDS review • Uniformly distribute the operations onto the available resources. • Operation probability • Distribution graph • Self force: changes on DG of scheduling an operation • Predecessor/successor force: implicit effects on DG • Schedule an operation to a step with the minimum force
1 4 ACO Formulation for TCS • Initialize pheromone model • While (termination not satisfied) • Create ants • Each ant finds a solution • Evaluate solutions and update pheromone • Report the best result found trailsijindicates the favorableness of assigning instruction i to position j S S 1 1 v1 v2 + v1 v2 v6 v8 v10 v6 v3 2 v7 v9 v11 + < v3 2 v4 v4 - - + v10 3 3 v7 v8 v9 v5 v11 - - v5 + < 4 4 E E vn vn
ACO Formulation for TCS • Initialize pheromone model • While (termination not satisfied) • Create ants • Each ant finds a solution • Evaluate solutions and update pheromone • Report the best result found • Select operation oph probabilistically • Select its timestep as following: Global Heuristics: tied with the searching experience Local Heuristics: use the inverse of distribution graph, 1/qk(j) Here and β are constants
Initialize pheromone model While (termination not satisfied) Create ants Each ant finds a solution Evaluate solutions and update pheromone Report the best result found ACO Formulation for TCS Rewarding good partial solutions based on solution quality Pheromone evaporation
MMAS TCS: Results • MMAS TCS is more stable than FDS, especially solution highly unconstrained • 258 out of 263 test cases are equal to or better than FDS results • 16.4% fewer resources
Design Space Exploration • DSE challenges to the designer • Ever increasing design options • Closely related w/ NP-hard problems • Resource allocation • scheduling • Conflict objectives (speed, cost, power, …) • Increasing time-to-market pressure
Our Focus: Timing/Cost • Timing/Cost Tradeoffs • Known application • Known resource types • Known operation/resource mapping • Question: find the optimal timing/cost tradeoffs • Most commonly faced problem • Fundamental to other design considerations
Common Strategies • Usually done in an ad-hoc way • Experience dependent • Or Scanning the design space withResource Constrained (RCS) or Time Constrained (TCS) scheduling • What’s the problem? • RCS and TCS are dual problems • Can we effectively use information from one to guide the other?
Key Observations • A feasible configuration C covers a beam starting from (tmin, C) • tminis the RCS result for C
Key Observations • A feasible configuration C covers a beam starting from (tmin, C) • Optimal tradeoff curve L is monotonically non-increasing as deadline increases
Theorem • If C is the optimal TCS result at time t1, then the RCS result t2 of C satisfies t2 <= t1. • More importantly, there is no configuration C′with a smaller cost can produce an execution time within [t2, t1].
What does it give us? • It implies that we can construct L: • Starting from the rightmost t • Find TCS solution C • Push it to leftwards using RCS solution of C • Do this iteratively (switch between TCS + RCS)
Experiments • Three DSE approaches • FDS: Exhaustively scanning for TCS • MMAS-TCS: Exhaustively scanning for TCS • MMAS-D: Proposed method leveraging duality * Scanning means that we perform TCS on each interested deadline
Real Design Complications • Heterogeneous mapping • One operation has many implementations • Different bit-width, e.g. 32-bit multiplier good for mul(24) and mul(32) • Different area and delay • Real technology library extremely sophisticated • Hard to estimate final timing and total area • Sharing depends on the cost of multiplexers • Downstream tools may not generate what we expect • Resource sharing, register sharing • Downstream tools break components’ boundaries • Logic synthesis, placement and routing