1 / 26

Temporal Placement

Optimize resource usage through dynamic on-line temporal placement algorithms, addressing computational intensity within milliseconds, selecting best sites for new components, and managing communication efficiently. Offline algorithms like first/best fit are utilized for precise deployment.

denger
Download Presentation

Temporal Placement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Temporal Placement

  2. Wasted Resources • Wasted resourcewr(vi) of a nodevi: • Unused area occupied by the node viduring the computation of a partition. wr(vi) = (t(Pi)−ti)×ai, t(Pi):run-time of partition Pi. ti: run-time of the component vi ai: area of vi • Wasted resource avoidance: • A component is placed on the chip only when its computation is required • and remains on the device only for time it is active. •  Idle components can be replaced by new ones. • Assumptions: • There is partial reconfiguration support. • Blocks are hard • For soft library modules, temporal partitioning + 2d placement may suffice.

  3. Temporal placement: Time-dependent placement Management of task execution at run-time Graphical Representation: Z axis: life time of a configuration. Some overlaps (resource sharing) in 2-D is allowed: If not at the same time. Temporal Placement • A horizontal cut: RFU configuration at a given time

  4. Temporal Placement • Off-line (compile time) • Pre-defined placement sequence • In DSP applications, program flow can be predicted. • On-line (during execution) • Computation sequence is not known in advance. •  Dynamically at run time. • Needs to solve computational intensive problems in a fraction of millisecond: • Efficient management of the free space • Selection of the best site for a new component • Management of communication

  5. Off-Line Temporal Placement • Off-line temporal placement: • Given a DFG G = (V,E) and a reconfigurable device H (with length Hxand width Hy), a temporal placement is a 3-dimensional vector function: p = (px, py, pt) : V → R3, ptdefines a feasible schedule (pt(vi): Starting time of vi.) px(vi), py(vi): Coordinates of vi on H,

  6. Off-Line Temporal Placement [Bazargan]

  7. Off-Line Temporal Placement • Algorithms: • First fit • Best fit • Packing • Simulated annealing (KAMER)

  8. First/Best Fit Algorithm • First fit: • Select a node among ready nodes. • i.e. nodes whose predecessors have been placed • Place it in the first free location. • Advantages: • Fast (linear w.r.t. number of free locations) • Disadvantages: • Unused resources very high • Fragmentation

  9. CLBs Frames First/Best Fit Algorithm • Best fit: • Select a node among ready nodes. • Place it in the best free location. • Advantages: • Better area utilization • Disadvantages: • Much slower: • Complete list of free locations must be searched. • Fragmentation • Simplifying the problem: • Restricting the modules to columns (Virtex II) or part of a column (Virtex 4 / Virtex 5 / Virtex 6).

  10. First Fit Algorithm • Algorithm: First-fit temporal placement of clusters 1.While all the clusters are not placed do 1.1. Select one cluster Cact from the list of ready-to-run clusters 1.2. From the clusters already placed, select the cluster Ctop with the smallest finishing-time such that Ctop ≤ Cact 1.3. Place the cluster Cact on top of Ctop 2. end while

  11. First Fit Algorithm • Configuration sequence produced: • {0, 1, 2, 3, 4} • {5, 1, 2, 3, 4} • {5, 1 , 6, 7, 4} • {5, 1 , 6, 7, 8} • {5, 9 , 6, 7, 8} • {10, 9 , 6, 7, 8}

  12. ILP • ILP: • Can construct a constraint set • An objective function • Advantage: • Exact solution • Limitation: • Only for small size problems.

  13. Packing Approach • Variations of Packing Problem: • Base Minimization Problem (BMP): • Given a set of boxes B and a height H, find a container with minimal size (x, y, H) that can accommodate the set of boxes B • i.e. in temporal placement: • Find the device with minimum size (x, y) on which a set of components can be implemented given an overall run-time t = H. • Strip Packing Problem (SPP): • Given a set of boxes and a base (X, Y), find the minimum height h container with size (X, Y, h) that can hold all the boxes. • i.e. in temporal • Find the minimum run-time t = h for a set of components given a reconfigurable device with size (X, Y). • The device is usually fixed •  SPP is more common.

  14. KAMER

  15. KAMER Model of an RCS • Larger picture: • An RFU operation ri can be either accepted or rejected • based on availability of RFU real-estate • If rejected, run on CPU •  running time penalty • Assumption: • No communications among RFUs • After running an operation on RFU, results are saved in CPU registers

  16. KAMER Model of an RCS • On-line temporal placement • RFUOPS = {r1, …, rn| ri = (wi, hi, si, ei)} • ei - si : time span the operation is resident in the system • Placement of RFUOPS: • Modeled as 3D template placement • Empty Rectangle (ER): • A rectangle that does not overlap a placed module on the chip • Maximum Empty Rectangle (MER): • An ER not included in any other rectangle than the device bounding box

  17. Example • Non-ER: • (E,F) • ER: • (E,D): MER • (A,D): MER • (E,C): not MER • KAMER method: • keeping all maximum empty rectangles: • Permanently keeps track of all MERs • Whenever a request for placing a component v arrives, the list of MERs is searched for a rectangle that can accommodate v. • Possible to have many MERs in which v can fit •  Strategies: first-fit or best-fit

  18. KAMER • Once a rectangle is chosen: • Candidate points are those that do not allow an overlap with the external part of the rectangle. • Problem: • Number of empty rectangles does not grow linear with the number of components included

  19. KAMER • Run-time of free rectangle placement: O(n2) • Heuristic: • Keep only the non-overlapping empty rectangles •  Linear time, lower quality • Problem 1: • Several non-overlapping rectangle representations

  20. Keeping Non-Overlapping ERs • Problem 2: • Non-overlapping empty rectangles are not necessarily maximal •  A module may exists that could fit onto the device, but cannot be placed Can’t fit (bad decomposition) Can fit

  21. Keeping Non-Overlapping ERs • Problem 2: Can fit Can’t fit

  22. Keeping Non-Overlapping ERs • Incremental update: • Whenever a new component v1 is placed in a rectangle, two possibilities: • Horizontal splitting • Vertical splitting • Negative impact on next components

  23. Keeping Non-Overlapping ERs • Horizontal Can’t fit • Vertical • Solution: • Delaying the split decision for a number of steps later Can fit

  24. Keeping Non-Overlapping ERs • KAMER always finds a rectangle to place a new component, if one exists. • But position to place the component within the rectangle must be selected from a set of points • whose number is the area of the rectangle in worst case • In [Bazargan00]: • Bottom-left position • Note: No communication is assumed between the rectangle to be placed and those already placed. • If communication exists: • An optimal algorithm should consider any single point • Solution: • [Ahmadinia04]

  25. KAMER Off-Line Placement • Uses simulated annealing in a 3D space: • Places X% largest modules • X is a parameter of the algorithm • Idea: Remove small modules •  Open space for larger ones •  Small modules fragment more • Initial placement: from KAMER- best fit • Perturb: • Can displace a module on the RFU • No change in start or end time • Can move an operation from CPU to RFU and vice versa • Use zero-temperature SA for fine tuning • Place as many (100-X)% modules as it can

  26. References • [Bobda07] C. Bobda, “Introduction to Reconfigurable Computing: Architectures, Algorithms and Applications,” Springer, 2007. • [Hauck08] S. Hauck, A. DeHon, "Reconfigurable Computing: The Theory and Practice of FPGA-Based Computation" Morgan-Kaufmann, 2007 • [Mehdipour06] F. Mehdipour*, M. Saheb Zamani, M. Sedighi, “An integrated temporal partitioning and physical design framework for static compilation of reconfigurable computing systems,” Journal of Microprocessors and Microsystems, Elsevier, v30, 2006, pp. 52–62. • [Bazargan00] K. Bazargan, R. Kastner and M. Sarrafzadeh, "Fast Template Placement for Reconfigurable Computing Systems," IEEE Design and Test - Special Issue on Reconfigurable Computing, January-March 2000. • [Bazargan] Bazargan, “Fast Placement Methods for Reconfigurable Computing Systems,” http://www.ece.umn.edu/~kia/Research/Pre2000/RCSPlace/ • [Ahmadinia04] A. Ahmadinia, C. Bobda, M. Bednara, and J. Teich, “A new approach for on-line placement on reconfigurable devices,” in Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS) / Reconfigurable Architectures Workshop (RAW), 2004.

More Related