450 likes | 561 Views
Physical Design for Reconfigurable Computing Systems using Firm Templates. Department of Electrical & Computer Engineering Northwestern University. K. Bazargan R. Kastner M. Sarrafzadeh. Outline. Outline. FPGA: What and why? What is Reconfigurable Computing System (RCS)?
E N D
Physical Design for Reconfigurable Computing Systems using Firm Templates Department of Electrical & Computer Engineering Northwestern University K. Bazargan R. Kastner M. Sarrafzadeh
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
RFU CPU instructions The Architecture of a Reconfigurable System Data Memory Data Data CPU Control Data RFUOPs Instruction Memory (Program)
Code DFG … => x = 3*a - b; (on CPU) => C = RFUOP1(x,5); (on RFU) => y = 4*x - c; for (i=0;i<3;i++){ t y => x+=RFUOP2(y); No room on RFU to run all in parallel ==> run in sequence ++y; x RFU } z = RFUOP1(x,3); => a = z - y; => (in parallel) b = RFUOP3(a,b); => c = a - b; => … => Execution of a Sample Program
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
Application Example: Image Restoration The value of the center pixel in the next iteration: xk+1 = *y + xk - * (d**xk) y: the pixel value from the original degraded image xk: the pixel value from the previous iteration d**xkdenotes the weighted sum r1* (eight neighbor pixels) + r0 * center pixel r1 r1 r1 r1 r1 r0 r1 r1 r1
m n o Image Restoration (cont.) • Incentive: • Processing of large images using FPGA’s with limited resources • Strategy: • Segmentation of the image intosmaller sized images suitablefor the FPGA • Segments of size m x nare surrounded by an overlap of o.
m n o Image Restoration: Data Flow Strategy • Data flow strategy • Pixels of individual segments are restored in parallel by hardware. • Restored segments are written back after the overlap is discarded MEMORY RFU
Image Restoration Example Degraded Image Restored Image
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
CPU instructions Program Manager Configuration Memory Instruction Mem. (Prog.) RFU Config. Bits RFUOPs Control Cache Manager Prefetch/Branch Prediction Unit Placement Engine RFU Manager System Components CPU Data Data Memory Data Data
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problemdefinition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
arrival departure • Output: • For each module, either • Rejected (not able to place) [penalty?] • Accepted: (x,y) accepted rejected Online Placement: Problem Definition • Input: • RFU dimensions (W, H) • List of RFUOP events: (w, h, arrival, departure)
New module to be inserted Online Placement Current Placement + = ? • When a new RFUOP arrives, • Is there enough room? • If yes, which location is best? • Previous work • Bin-packing heuristics (1-D) - O(n2) • First Fit, Best Fit, Shelf, Look ahead, … • [Chazelle’83] The Bottom-Left heuristic. O(n2) • [Healy-Creavin’97] O(n2 lg n)
Our Online Placement • Our approach: • Divide the empty space into explicit “empty rectangles” • When a new RFUOP arrives • Is there enough room? (any ER large enough?) • If yes, which location is best? (which ER is best?) • Packing rule • Best Fit, Bottom Left, First Fit
Current Placement New module to be inserted A = ? B FF (First Fit) BL (Bottom Left) BF (Best Fit) P1 P2 Any of A or B could be chosen for placing the new module. Chooses the empty rect which is more to the bottom left Places the new module in the empty rectangle which causes less wasted space. y(P2) < y(P1) Choose B Area() < Area( ) Choose A Heuristics for Choosing an Empty Rectangle +
Our Online Placement • Our approach: • Divide the empty space into explicit “empty rectangles” • When a new RFUOP arrives • Is there enough room? (any ER large enough?) • If yes, which location is best?(which ER is best?) • Managing the empty space • Keep empty rectangles explicitly, use “range tree” to store/access empty rects. • Efficient use of RFU real estate • KAMER: Keep all O(n2) maximal empty rectangles
Our Online Placement • Our approach: • Divide the empty space into explicit “empty rectangles” • When a new RFUOP arrives • Is there enough room? (any ER large enough?) • If yes, which location is best?(which ER is best?) • Managing the empty space • Keep empty rectangles explicitly, use “range tree” to store/access empty rects. • Efficient use of RFU real estate • KAMER: Keep all O(n2) maximal empty rectangles • Fast but sub-optimal • Keep only O(n) empty rectangles • Shorter Seg. (SSEG), Square Empty Rects. (SQR), ...
Heuristics for Choosing a Segment A S1 C A C B B S2 D D BER (Balanced Empty Rects) LSQR (Larger Rect Square) SSEG (Shorter Seg) Chooses the shorter of the two segments. Chooses the segment which creates less area difference. Chooses the segment which creates the larger rectangle closer to square. Area(B) - Area(A) > Area(D) - Area(C) S1 < S2 AspectRatio(B) > AspectRatio(D) A C S1 A C B B S2 D D LER (Large Empty Rects) LSEG (Longer Seg) SQR (Square Rects) Chooses the segment which creates empty rectangles closer to squares. Chooses the longer of the two segments. Chooses the segment which creates the larger empty rectangle. Max{AR(A),AR(B)} < Max{AR(C),AR(D)} AR = AspectRatio S1 < S2 Area(B) > Area(D)
How Good is a Placement? • Acceptance rate • percentage of modules accepted (placed) • Volume penalty • Area complexity • Time-span in the system loop iterations • Penalty of rejecting a module penalty = volume = area * time • Input data • Randomly generated dimensions • Randomly generated enter/leave time
Program snapshot
Online Placement Results Percentage of accepted modules using different bin-packing and empty space partitioning rules
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
t y x 3-D Floorplanning DFG Schedule RFU CPU RFU area time RFU
t y By deleting this RFUOP (CPU performs the operation)... x 3-D Floorplanning DFG Schedule RFU CPU RFU
t y This RFUOP can be moved on the RFU x 3-D Floorplanning DFG Schedule RFU CPU RFU
t y These RFUOPs can be performed earlier... x 3-D Floorplanning DFG Schedule RFU CPU RFU
t y x 3-D Floorplanning DFG Schedule RFU CPU RFU
Our Current 3-D Floorplanners • No change in the schedule • Fixed insertion and deletions of RFUOPs • Annealing based. • Move set • Move operation from CPU set to RFU set • Move operation from RFU set to CPU set • Displace an already placed RFUOP on the RFU • Cost function • Penalty in rejecting modules (sum of volumes of the RFUOPs in the CPU set) • No overlap allowed during annealing • Greedy • Sort the modules on decreasing vol., apply KAMER
Our Current 3-D Floorplanners (cont.) • KAMER-BF-Decreasing • Sort the modules on their volumes • Use KAMER to find a fast placement of the modules • Low-temp. annealing (LTSA) • Similar to KAMER-BFD, but use KAMER to place only the X% largest modules • Use low-temp annealing to place the rest • Zero-temp. annealing (ZTSA) -- Greedy • Use KAMER to place as many modules as you can • Use only displace and move from CPU to RFU annealing moves.
Our Current 3-D Floorplanners (cont.) • BFOP - Best Fit Online Placement • Sort the RFUOPs on volume (decreasing) • For each RFUOP, find candidate “corners” • Choose the corner which results in min wasted area(similar to well-studied 2-D Bin Packing problem) corners t1 t1 A Floor corresponding to time t1 t y x
Annealing-Based Offline vs. Online Percentage of accepted modules and penalties using two offline parameters. The higher the RFU acceptance rate and lower the penalty, the better the algorithm.
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
Flexible Modules • Library of soft templates • Flexible shapes • Constant area, different width,height • Problem? Hard to build (PD should be done for each shape) • Median • Use the same area, but square shape • Rotation • Placement method • Use best shape (min wasted area)
Using Flexible Modules in BFOP Median uses a square module with the same area
Flexible Modules (cont.) • “Firm” templates • Slice the module into x horizontal or vertical strips • If cannot place the module, use the 2-split, 3-split, … until you can fit. • Problem? • Routing! • Limited module types can be split (like carry chains, etc. with min communication between stages) Vertical 3-split
Outline Outline • FPGA: What and why? • What is Reconfigurable Computing System (RCS)? • Application example • RCS: System components • Online placement: problem definition and our approach • Offline placement and scheduling • Flexible modules and firm templates • Conclusion and future work
Conclusion • Which online algorithm? • If speed is an issue, SSEG, ow KAMER • Online or offline? • If you have the schedule => offline • Which offline algorithm? • BFOP is the best (faster+better quality) • Median? Flexibility? Firm templates? • Surprisingly, median gives little improvement • If flexible shape avail, better than splitting (no additional routing problem) • How many splits? • no-split 2-split: 23% improvement • 5-split 6-split: 3% improvement