270 likes | 282 Views
Optimize resource usage through dynamic on-line temporal placement algorithms, addressing computational intensity within milliseconds, selecting best sites for new components, and managing communication efficiently. Offline algorithms like first/best fit are utilized for precise deployment.
E N D
Wasted Resources • Wasted resourcewr(vi) of a nodevi: • Unused area occupied by the node viduring the computation of a partition. wr(vi) = (t(Pi)−ti)×ai, t(Pi):run-time of partition Pi. ti: run-time of the component vi ai: area of vi • Wasted resource avoidance: • A component is placed on the chip only when its computation is required • and remains on the device only for time it is active. • Idle components can be replaced by new ones. • Assumptions: • There is partial reconfiguration support. • Blocks are hard • For soft library modules, temporal partitioning + 2d placement may suffice.
Temporal placement: Time-dependent placement Management of task execution at run-time Graphical Representation: Z axis: life time of a configuration. Some overlaps (resource sharing) in 2-D is allowed: If not at the same time. Temporal Placement • A horizontal cut: RFU configuration at a given time
Temporal Placement • Off-line (compile time) • Pre-defined placement sequence • In DSP applications, program flow can be predicted. • On-line (during execution) • Computation sequence is not known in advance. • Dynamically at run time. • Needs to solve computational intensive problems in a fraction of millisecond: • Efficient management of the free space • Selection of the best site for a new component • Management of communication
Off-Line Temporal Placement • Off-line temporal placement: • Given a DFG G = (V,E) and a reconfigurable device H (with length Hxand width Hy), a temporal placement is a 3-dimensional vector function: p = (px, py, pt) : V → R3, ptdefines a feasible schedule (pt(vi): Starting time of vi.) px(vi), py(vi): Coordinates of vi on H,
Off-Line Temporal Placement [Bazargan]
Off-Line Temporal Placement • Algorithms: • First fit • Best fit • Packing • Simulated annealing (KAMER)
First/Best Fit Algorithm • First fit: • Select a node among ready nodes. • i.e. nodes whose predecessors have been placed • Place it in the first free location. • Advantages: • Fast (linear w.r.t. number of free locations) • Disadvantages: • Unused resources very high • Fragmentation
CLBs Frames First/Best Fit Algorithm • Best fit: • Select a node among ready nodes. • Place it in the best free location. • Advantages: • Better area utilization • Disadvantages: • Much slower: • Complete list of free locations must be searched. • Fragmentation • Simplifying the problem: • Restricting the modules to columns (Virtex II) or part of a column (Virtex 4 / Virtex 5 / Virtex 6).
First Fit Algorithm • Algorithm: First-fit temporal placement of clusters 1.While all the clusters are not placed do 1.1. Select one cluster Cact from the list of ready-to-run clusters 1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that Ctop ≤ Cact 1.3. Place the cluster Cact on top of Ctop 2. end while
First Fit Algorithm • Configuration sequence produced: • {0, 1, 2, 3, 4} • {5, 1, 2, 3, 4} • {5, 1 , 6, 7, 4} • {5, 1 , 6, 7, 8} • {5, 9 , 6, 7, 8} • {10, 9 , 6, 7, 8}
ILP • ILP: • Can construct a constraint set • An objective function • Advantage: • Exact solution • Limitation: • Only for small size problems.
Packing Approach • Variations of Packing Problem: • Base Minimization Problem (BMP): • Given a set of boxes B and a height H, find a container with minimal size (x, y, H) that can accommodate the set of boxes B • i.e. in temporal placement: • Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H. • Strip Packing Problem (SPP): • Given a set of boxes and a base (X, Y), find the minimum height h container with size (X, Y, h) that can hold all the boxes. • i.e. in temporal • Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y). • The device is usually fixed • SPP is more common.
KAMER Model of an RCS • Larger picture: • An RFU operation ri can be either accepted or rejected • based on availability of RFU real-estate • If rejected, run on CPU • running time penalty • Assumption: • No communications among RFUs • After running an operation on RFU, results are saved in CPU registers
KAMER Model of an RCS • On-line temporal placement • RFUOPS = {r1, …, rn| ri = (wi, hi, si, ei)} • ei - si : time span the operation is resident in the system • Placement of RFUOPS: • Modeled as 3D template placement • Empty Rectangle (ER): • A rectangle that does not overlap a placed module on the chip • Maximum Empty Rectangle (MER): • An ER not included in any other rectangle than the device bounding box
Example • Non-ER: • (E,F) • ER: • (E,D): MER • (A,D): MER • (E,C): not MER • KAMER method: • keeping all maximum empty rectangles: • Permanently keeps track of all MERs • Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v. • Possible to have many MERs in which v can fit • Strategies: first-fit or best-fit
KAMER • Once a rectangle is chosen: • Candidate points are those that do not allow an overlap with the external part of the rectangle. • Problem: • Number of empty rectangles does not grow linear with the number of components included
KAMER • Run-time of free rectangle placement: O(n2) • Heuristic: • Keep only the non-overlapping empty rectangles • Linear time, lower quality • Problem 1: • Several non-overlapping rectangle representations
Keeping Non-Overlapping ERs • Problem 2: • Non-overlapping empty rectangles are not necessarily maximal • A module may exists that could fit onto the device, but cannot be placed Can’t fit (bad decomposition) Can fit
Keeping Non-Overlapping ERs • Problem 2: Can fit Can’t fit
Keeping Non-Overlapping ERs • Incremental update: • Whenever a new component v1 is placed in a rectangle, two possibilities: • Horizontal splitting • Vertical splitting • Negative impact on next components
Keeping Non-Overlapping ERs • Horizontal Can’t fit • Vertical • Solution: • Delaying the split decision for a number of steps later Can fit
Keeping Non-Overlapping ERs • KAMER always finds a rectangle to place a new component, if one exists. • But position to place the component within the rectangle must be selected from a set of points • whose number is the area of the rectangle in worst case • In [Bazargan00]: • Bottom-left position • Note: No communication is assumed between the rectangle to be placed and those already placed. • If communication exists: • An optimal algorithm should consider any single point • Solution: • [Ahmadinia04]
KAMER Off-Line Placement • Uses simulated annealing in a 3D space: • Places X% largest modules • X is a parameter of the algorithm • Idea: Remove small modules • Open space for larger ones • Small modules fragment more • Initial placement: from KAMER- best fit • Perturb: • Can displace a module on the RFU • No change in start or end time • Can move an operation from CPU to RFU and vice versa • Use zero-temperature SA for fine tuning • Place as many (100-X)% modules as it can
References • [Bobda07] C. Bobda, “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications,” Springer, 2007. • [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation" Morgan-Kaufmann, 2007 • [Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, “An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems,” Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 52–62. • [Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000. • [Bazargan] Bazargan, “Fast Placement Methods for Reconfigurable Computing Systems,” http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/ • [Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, “A new approach for on-line placement on reconfigurable devices,” in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004.