1 / 35

Congestion-Driven Re-Clustering for Low-cost FPGAs

Congestion-Driven Re-Clustering for Low-cost FPGAs. MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Outline. Motivation and background Algorithm Results Conclusion

akira
Download Presentation

Congestion-Driven Re-Clustering for Low-cost FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

  2. Outline • Motivation and background • Algorithm • Results • Conclusion • Future Work

  3. Example: Unroutable • Situation: Run circuit through VPR • Circuit is unroutable at specified target channel-width

  4. Example: Unroutable • Situation: Run circuit through VPR • Circuit is unroutable at specified target channel-width • Only localized area is actually unroutable • Routing congestion happens locally

  5. Motivation • Goal: Must meet hard channel-width constraint • Number of routing tracks is fixed at manufacture time • Must meet channel-width constraint everywhere on the FPGA • Presented with an unroutable circuit • Increase available interconnect (use larger device) • More interconnect everywhere = more expensive FPGA device • Decrease local interconnect demand • Create more aggregate interconnect for congested regions only

  6. Reclustering Congested Regions • Find congested regions and reduce routing demand • Increase CLB usage to spread interconnect usage • Controlled tradeoff between CLB and interconnect usage • Cost savings: can use the same FPGA, just need to recluster

  7. Un/DoPack CAD Flow • Previous work by Marvin Tom [ICCAD2006] • Target a channel width constraint • Spread regional logic to reduce local routing demand • Identify congested local regions • Iteratively recluster, replace, reroute • Whitespace insertion: recluster with reduced cluster size • Leave uncongested regions alone

  8. Background: Un/DoPack VPR Cluster Place Route Target CW Met? Re-cluster Identify NO YES Place Route Un/DoPack

  9. Contributions • Improve Reduce/Area tradeoff of Un/DoPack Flow • Simultaneous area and runtime savings • Use congestion information to perform better reclustering • New approach to selecting congested regions • Use of interconnect-demand model to determine how much to spread logic • Findings • Up to 5x runtime speedup versus Baseline • Up to 25% area savings versus Baseline

  10. Contributions • Recently accepted to FPT 2009 • D. Chiu, G. Lemieux, S. Wilton, “Congestion-Driven Re-Clustering for Low-cost FPGAs”

  11. Previous Depopulation Schemes • Single versus Multiregion: Region Selection: • Select all CLBs in area centered on most congested CLB (Single Region) • Select all CLBs in area centered on most congested CLB, not already chosen (Multiregion) • Whitespace insertion: • Baseline: insert CLB in 1 row and 1 column in FPGA • Fine-grained : insert CLB in 1 row and 1 column in region • Excellent area tradeoff, but slow • Multiregion: insert CLBs proportional to congestion • Good runtime performance

  12. Algorithm • Region Selection: Try to select regions more intelligently • Capture congested regions instead of just CLBs • Whitespace Insertion: Try to estimate appropriate cluster size • Use interconnect demand model to predict outcome for depopulation

  13. Un/DoPack VPR Cluster Place Route Whitespace Insertion Region selection Target CW Met? Re-cluster Identify NO YES Place Route Un/DoPack

  14. Benchmark Circuits • Metacircuits designed to emulate large SOC circuits • Cluster size 16 • Built using benchmark generator GNL • Large circuit composed of smaller subcircuits (SoC style) • Each subcircuit emulates the interconnect complexity (Rent parameter) of individual MCNC circuits • The standard deviation of the rent parameter is varied to create benchmark suite

  15. Region Selection • Find congested regions • Post Routing congestion information • Center region on most congested CLB

  16. Region Selection • Use congestion values to generate direction to move region

  17. Region Selection • Binary Search • Find region with highest average congestion

  18. Region Selection • Mark Next Region • Sort by average congestion and depopulate in sorted order

  19. Budgeted Multiregion Un/DoPack (BMR) • Multiple region approach • Grow number of CLBs according to budget at each iteration • Number of CLBs in a row and column of the FPGA • Each region grows equal to 1 row and 1 column of region

  20. Adding Whitespace • Congestion-Model Driven • Use interconnect demand information to estimate how much whitespace to add • Interconnect Model • Estimate post clustering channel width for region

  21. Regional Interconnect • Adapt demand model for regions of CLBs instead of whole FPGA • Most wiring is from inside the region • Cannot affect wiring across region directly through depopulation

  22. Regional Interconnect • Assume external interconnect demand stays fixed • Solve for internal interconnect demand region interconnect demand = Internal demand + external demand

  23. Interconnect-Demand Model where • W. Fang and J. Rose. “Modeling routing demand for early-stage FPGA architecture development”

  24. Interconnect-Demand Model • Use congestion map to determine equation constants • Calibrate equation separately for each region • Solve for lambda that gives desired channel width • Re-cluster region with lower cluster size until lambda target is met

  25. Congestion-Model Multiregion Un/DoPack (CMR) • Same region selection method as BMR • No constraint on new CLBs in each iteration • Whitespace insertion using model

  26. Results • Typical results • Stdev004 • CMR Speedup comparable to Multiregion Un/DoPack • BMR Slightly faster than Baseline

  27. Results • Typical results • Stdev004 • BMR area better than Multiregion • CMR area better than Multiregion

  28. Runtime / Area Tradeoff • Previous Multiregion Approach (Fast) • Previous Fine-Grained Approach (Good Area) • Speed-Area Tradeoff

  29. Runtime / Area Tradeoff • BMR • Improved runtime • Good area performance

  30. Runtime / Area Tradeoff • CMR • Better runtime • Good area

  31. Critical Path

  32. Congestion Driven Placement • We can further improve area performance using a congestion driven placer

  33. Conclusions • Use congestion information to perform better re-clustering • Up to 5x runtime speedup versus baseline • Up to 25% area savings versus baseline • Improve Reduce/Area tradeoff of Un/DoPack Flow • Simultaneous area and runtime savings

  34. Future Work • Consider effect of neighboring Regions • Other congestion-driven tools • Fast Placement

  35. Questions?

More Related