350 likes | 508 Views
Congestion-Driven Re-Clustering for Low-cost FPGAs. MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Outline. Motivation and background Algorithm Results Conclusion
E N D
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada
Outline • Motivation and background • Algorithm • Results • Conclusion • Future Work
Example: Unroutable • Situation: Run circuit through VPR • Circuit is unroutable at specified target channel-width
Example: Unroutable • Situation: Run circuit through VPR • Circuit is unroutable at specified target channel-width • Only localized area is actually unroutable • Routing congestion happens locally
Motivation • Goal: Must meet hard channel-width constraint • Number of routing tracks is fixed at manufacture time • Must meet channel-width constraint everywhere on the FPGA • Presented with an unroutable circuit • Increase available interconnect (use larger device) • More interconnect everywhere = more expensive FPGA device • Decrease local interconnect demand • Create more aggregate interconnect for congested regions only
Reclustering Congested Regions • Find congested regions and reduce routing demand • Increase CLB usage to spread interconnect usage • Controlled tradeoff between CLB and interconnect usage • Cost savings: can use the same FPGA, just need to recluster
Un/DoPack CAD Flow • Previous work by Marvin Tom [ICCAD2006] • Target a channel width constraint • Spread regional logic to reduce local routing demand • Identify congested local regions • Iteratively recluster, replace, reroute • Whitespace insertion: recluster with reduced cluster size • Leave uncongested regions alone
Background: Un/DoPack VPR Cluster Place Route Target CW Met? Re-cluster Identify NO YES Place Route Un/DoPack
Contributions • Improve Reduce/Area tradeoff of Un/DoPack Flow • Simultaneous area and runtime savings • Use congestion information to perform better reclustering • New approach to selecting congested regions • Use of interconnect-demand model to determine how much to spread logic • Findings • Up to 5x runtime speedup versus Baseline • Up to 25% area savings versus Baseline
Contributions • Recently accepted to FPT 2009 • D. Chiu, G. Lemieux, S. Wilton, “Congestion-Driven Re-Clustering for Low-cost FPGAs”
Previous Depopulation Schemes • Single versus Multiregion: Region Selection: • Select all CLBs in area centered on most congested CLB (Single Region) • Select all CLBs in area centered on most congested CLB, not already chosen (Multiregion) • Whitespace insertion: • Baseline: insert CLB in 1 row and 1 column in FPGA • Fine-grained : insert CLB in 1 row and 1 column in region • Excellent area tradeoff, but slow • Multiregion: insert CLBs proportional to congestion • Good runtime performance
Algorithm • Region Selection: Try to select regions more intelligently • Capture congested regions instead of just CLBs • Whitespace Insertion: Try to estimate appropriate cluster size • Use interconnect demand model to predict outcome for depopulation
Un/DoPack VPR Cluster Place Route Whitespace Insertion Region selection Target CW Met? Re-cluster Identify NO YES Place Route Un/DoPack
Benchmark Circuits • Metacircuits designed to emulate large SOC circuits • Cluster size 16 • Built using benchmark generator GNL • Large circuit composed of smaller subcircuits (SoC style) • Each subcircuit emulates the interconnect complexity (Rent parameter) of individual MCNC circuits • The standard deviation of the rent parameter is varied to create benchmark suite
Region Selection • Find congested regions • Post Routing congestion information • Center region on most congested CLB
Region Selection • Use congestion values to generate direction to move region
Region Selection • Binary Search • Find region with highest average congestion
Region Selection • Mark Next Region • Sort by average congestion and depopulate in sorted order
Budgeted Multiregion Un/DoPack (BMR) • Multiple region approach • Grow number of CLBs according to budget at each iteration • Number of CLBs in a row and column of the FPGA • Each region grows equal to 1 row and 1 column of region
Adding Whitespace • Congestion-Model Driven • Use interconnect demand information to estimate how much whitespace to add • Interconnect Model • Estimate post clustering channel width for region
Regional Interconnect • Adapt demand model for regions of CLBs instead of whole FPGA • Most wiring is from inside the region • Cannot affect wiring across region directly through depopulation
Regional Interconnect • Assume external interconnect demand stays fixed • Solve for internal interconnect demand region interconnect demand = Internal demand + external demand
Interconnect-Demand Model where • W. Fang and J. Rose. “Modeling routing demand for early-stage FPGA architecture development”
Interconnect-Demand Model • Use congestion map to determine equation constants • Calibrate equation separately for each region • Solve for lambda that gives desired channel width • Re-cluster region with lower cluster size until lambda target is met
Congestion-Model Multiregion Un/DoPack (CMR) • Same region selection method as BMR • No constraint on new CLBs in each iteration • Whitespace insertion using model
Results • Typical results • Stdev004 • CMR Speedup comparable to Multiregion Un/DoPack • BMR Slightly faster than Baseline
Results • Typical results • Stdev004 • BMR area better than Multiregion • CMR area better than Multiregion
Runtime / Area Tradeoff • Previous Multiregion Approach (Fast) • Previous Fine-Grained Approach (Good Area) • Speed-Area Tradeoff
Runtime / Area Tradeoff • BMR • Improved runtime • Good area performance
Runtime / Area Tradeoff • CMR • Better runtime • Good area
Congestion Driven Placement • We can further improve area performance using a congestion driven placer
Conclusions • Use congestion information to perform better re-clustering • Up to 5x runtime speedup versus baseline • Up to 25% area savings versus baseline • Improve Reduce/Area tradeoff of Un/DoPack Flow • Simultaneous area and runtime savings
Future Work • Consider effect of neighboring Regions • Other congestion-driven tools • Fast Placement