330 likes | 549 Views
ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement. Placement. VLSI Design Flow Objective: Minimize total chip area, Sustain routable circuit within timing budget FPGA Flow Area fixed Objective:
E N D
Placement • VLSI Design Flow • Objective: • Minimize total chip area, • Sustain routable circuit within timing budget • FPGA Flow • Area fixed • Objective: • Assign LUTs in the netlist to available logic blocks in the array within utilization and performance constraints (Interconnect) • Locate functional blocks such that the interconnect required to route the signals between them is minimized. • Target Architecture determines the cost function
Placement algorithm • two basic inputs: • netlist with functional blocks and connections between them • device map (architecture) • algorithm selects a legal location for each block such that the circuit wiring is optimized.
Significance of Placement • Good placement is extremely important • sets constraints for routability • even if the circuit does route, a poor placement will still lead to a lower maximum operating speed and increased power consumption. • Finding a good placement is challenging • A large commercial FPGA contains over 500,000 functional blocks, • 500,000! Possible placements. • Exhaustive evaluation is therefore impossible. • Placement is a computationally hard problem, • no known algorithm that produces optimal results in practical central processing unit (CPU) time. • Development of fast and effective heuristic placement algorithms is a critical research area.
Device Legality Constraints • All resources are prefabricated in an FPGA • leads to a variety of placement legality constraints: • A legal placement must place a functional block only in a location on the chip that can accommodate it. • RAM block must be placed in a RAM location, and a lookup table (LUT) must be placed in a LUT location. • Some groups of functional blocks must be placed in a specific relative orientation to make use of special, dedicated routing resources. • arithmetic logic cells—to use the dedicated carry-chain hardware, the logic cells forming a carry chain must be placed adjacent to each other in the sequence required by the carry structure.
FPGA Placement Constraints • FPGA interconnect is prefabricated, • Amount of interconnect in each region of a device is fixed • Routing congestion • When the interconnect demand approaches or exceeds the fabricated wiring capacity in some part of the FPGA. • A placement that requires more interconnect in a device region than that region contains cannot be routed
X Y Length 4 Length 2 Length 1 FPGA Placement Constraints • Stratix-II is an island-style FPGA that contains routing segments that span 4, 16, and 24 logic blocks. • Programmable switches allow routing segments in the same direction (horizontal or vertical) to be connected at their endpoints to create longer routes. • Other programmable switches allow some horizontal routing segments to connect to vertical routing segments where they cross and vice versa.
Placement Objective– Routability Driven • Create a placement that minimizes the total interconnect required, • Increase the probability of successful routing • Consequently, some routability-driven placement algorithms minimize not only the total wiring required by the design but also the amount of routing congestion.
Placement Objective – Timing Driven • In addition to optimizing for routability, timing-driven algorithms use timing analysis • to identify critical paths and/or connections • to optimize the delay of those connections. • Most delays in an FPGA are due to the programmable interconnect • timing-driven placement can achieve a large improvement in circuit speed over routability-driven approaches.
Level of Control on Placement • Commercial FPGA placement tools allow designers to control the placement • Common types of placement directives. • 1) Exact location of a block • The most restrictive • Typical uses • to lock down the design I/Os at the locations required by the circuit board or to lock down the elements of a performance-critical intellectual property (IP) core. • 2) Area specific • less restrictive • forces blocks to go into a specific 2D area, • allows a designer to guide the placement tool
Level of Control on Placement • 3) Relative location • specify the relative location of several blocks, • placement tool chooses exactly where to locate the block group. • Typical use • for library components where a designer knows a good placement of the component blocks relative to each other. • 4) Floating region • specifies that some logic should be placed within a tight region • placement tool can choose where that region should be on the device.
Placement Algorithms • Constructive methods: • Begin from netlist and generate an initial placement. • Partitioning method: Mincut • First address placement of partitions individually • Significant amount of reduction in search space • Then address placement of partitions relative to each other • Not suitable for FPGAs • Especially island style FPGA with limited routing resources • Method postpones the impact of inter-partition connections • Leads to increased demand on routing tracks
E A 1 2 B C D F Placement • Placement has a set of competing goals. • Can’t optimize locally and globally simultaneously. • Use heuristic approaches to evaluate quality. A B LUT1 LUT2 C E D
Getting Stuck with Local Minima • pick a random starting point • repeatedly swap, • if the new state has a lower cost, it is accepted, • otherwise the current state is retained. • greedily accept good moves • Problem: large number of local minima • circuit placed as shown at left, is in a local minima. • No swap of logic or I/O functions will reduce the total wirelength.
Technology Mapping to Placement Mapping onto 5-LUT
Iterative Placement Algorithms • Iterative improvement • Begin with random or constructive placement. • Iterate to improve it. • Pairwise interchange • Hill climbing • To avoid getting trapped in local minima, consider “hill-climbing” approach • Need to accept worse solutions or make “bad” moves to get global minima. • Acceptance is probabalistic. Only accept cost-increasing moves some of the time.
Iterative Placement Algorithms • Methods • Force-directed methods (classical mechanics) • Force vector computed on each module corresponding to all nets • Solve set of non-linear differential equations. • FD relaxation • FD pairwise exchange • Simulated annealing (statistical mechanics) • Model a physical annealing process which optimizes energy. • Similar to “quenching” metal. • Generates best results • Can be time consuming • Macro-based approaches • Genetic algorithms
Physical Annealing • Take a metal and heat to high temperature • Allow it to cool slowly; metal is annealed to a low temperature • Atoms in the metal are at lower energy states after annealing • Higher the temperature initially and slower the cooling, the tougher the metal becomes. • Atoms transition to high energy states and then move to low energy.
Simulated Annealing • Optimization strategy based on physical annealing process • Generate random moves. • Initially, accept moves that decrease and increase cost. • As temperature decreases, the probability of accepting bad moves decreases. • Eventually, default to greedy algorithm Only accept positive moves Determine when to terminate.
Bounding Box and Cost Function • Bounding box underestimates wirelength • q(n) is compensation factor • q is 1 for 3- and 2-terminal nets • increases to 2.79 for 50 terminal nets • Cavis channel capacity (tracks) in x and y directions over the bounding box of net n • penalizes placements which require more routing in areas of the FPGA that have narrower channels. • However, Cavis constant since channel width is fixed for island style FPGA
Manhattan Wire length measures • Estimate wire length by distance between components. • Possible distance measures: • Euclidean distance (sqrt(x2 + y2)); • Manhattan distance (x + y). • Multi-point nets must be broken up into trees for good estimates. Euclidean
Weighted Graph -> Distance Table • Geometric Distance NOT Accurate !!! • Need Weighted Graph • Cost of Routing Resources • Finding Shortest Path at Each Step of Annealing costly • Need for Lookup Table
Simulated Annealing – Moves per iteration • Moves_per_iteration = BN4/3 • N = # of logic blocks and I/O pads • B = scaling factor
Simulated Annealing – Swapping Range • Swap distance is adjusted based on the acceptance rate as well. • Initially set to entire FPGA • As T drops, distance drops.
Simulated Annealing • New T depends on the fraction of attempted moves that were accepted. • Reduces rapidly when acceptance rate is high • When the temperature is less than a small fraction of the average cost of a net, it is unlikely that any move that results in a cost increase will be accepted, so we terminate the anneal.
Annealing Criteria • Contemporary FPGA packages use the following parameters: • Starting temp – 20 * stand_dev(cost of N swaps) • Cost function – weighted sum of wire length and delay • Inner loop – B * N4/3 • Beta cost function • Stopping criteria – • T < [.005 * Cost/Nnets]
Strengths of SA making it suitable for FPGA • Can enforce all the legality constraints imposed by the FPGA architecture fairly directly • By forbidding the creation of illegal placements in the move generator • By adding a penalty cost to illegal placements. • Can directly model the impact of the FPGA routing architecture on circuit delay and routing congestion • By creating an appropriate cost function