1 / 25

Performance-Optimal Clustering with Retiming for Sequential Circuits

Performance-Optimal Clustering with Retiming for Sequential Circuits. Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, Taiwan, R.O.C. Outline. Introduction Previous Work Proposed Approach Experimental Results

cisco
Download Presentation

Performance-Optimal Clustering with Retiming for Sequential Circuits

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, Taiwan, R.O.C. NTHU-CS

  2. Outline • Introduction • Previous Work • Proposed Approach • Experimental Results • Conclusion and Future Research NTHU-CS

  3. Retiming 3 5 2 critical path delay = 8 1 retiming 3 5 2 critical path delay = 7 1 NTHU-CS

  4. 3 5 2 1 Performance-Driven Clustering • Minimize clock period under cluster-size constraint NTHU-CS

  5. 3 5 2 1 Combining Clustering and Retiming clustering w/o retiming consideration clustering w/ retiming consideration 3 5 2 3 5 2 1 1 inter-cluster delay = 2 critical path delay = 8 critical path delay = 7 NTHU-CS

  6. Problem Definition • Given • a sequential circuit G, • a target clock period c, and • an area-bound number M • Find • a clustered/retimed/node-replicated circuit Gr • clock period less than or equal to c • each cluster is of size M or less NTHU-CS

  7. Previous Work • P. Pan, A. K. Karandikar, and C. L. Liu, “Optimal Clock Period Clustering for Sequential Circuits with Retiming,” IEEE T-CAD, June 1998. • Optimal under the unit gate delay model • Near-optimal for the general gate delay model • J. Cong, H. Li, and C. Wu, “Simultaneous Circuit Partitioning/Clustering with Retiming for Performance Optimization,” DAC’99. • 100X more efficient but still near-optimal NTHU-CS

  8. This Work • Optimal for the general gate delay model • More (2X) efficient than Pan’s approach NTHU-CS

  9. Pan’s Approach • Label each node v an l-value, l(v) • Find a clustered-retimed circuit such that all PO’s l-values less than or equal to c • Retiming solution • Resulting clock period less than c + max. gate delay NTHU-CS

  10. Pan’s l-value of a Node • Total w1edge weight of the longest path from PI’s to the node • w1weightof edge e from u to v: w1(e) = - c * w(e) + d(v) • w(e):number of FF’s along e target c = 6 2 5 3 w1(e) 2 - 1 3 0 l(v) 0 2 1 4 4 < 6 NTHU-CS

  11. Pan’s l-value Labeling • Traveling the whole circuit for updating l-values until no more updating in any node • Time complexity NTHU-CS

  12. Our Approach • Modified l-value definition • Optimal for general delay model • Based on W.-J. Chen, “A Study on the Relationship Between Retiming and Loop Folding,” Master thesis, National Tsing-Hua Univ., Taiwan, R.O.C., Aug. 1994. • FIFO to aid circuit traveling during labeling • Improve run time • Time complexity NTHU-CS

  13. Modified l-value Labeling • If an FF’s position is occupied by a gate v, • detected by target c = 6 2 5 3 l(v) 0 2 1 5 8 8 > 6 NTHU-CS

  14. 3 5 2 2 1 3 1 5 1 3 3 3 5 5 2 3 1 3 5 Example (target c = 7, inter-cluster delay = 2) 7 3 9 12 5 l(v) 3 1 3 10 12 1 l(v) 3 1 12 7 NTHU-CS

  15. 3 5 2 1 3 5 5 2 Example (Cont’) (target c = 7, inter-cluster delay = 2) 3 5 2 1 clustering connecting & retiming merging 3 3 5 3 5 2 1 1 NTHU-CS

  16. 3 5 2 2 1 3 1 5 1 3 3 3 5 5 2 3 1 3 5 Example (target c = 6, inter-cluster delay = 2) 7 3 9 11 5 l(v) 3 1 3 10 11 1 l(v) 3 1 11 7 NTHU-CS

  17. 3 5 2 2 1 3 1 5 1 3 3 3 5 3 2 1 3 5 Example of Pan’s Approach (target c = 6, inter-cluster delay = 2) 6 3 6 8 5 l(v) 3 1 3 10 1 l(v) 3 1 8 6 NTHU-CS

  18. 3 5 2 1 Example of Pan’s (Cont’) (target c = 6, inter-cluster delay = 2) 3 5 2 1 3 clustering connecting & retiming merging 3 3 5 3 5 2 1 1 2 NTHU-CS

  19. Experimental Results • 26 ISCAS-89 Benchmark Circuits • Pan’s approach produces suboptimal results for 11 circuits • Our approach produces optimal result for every circuit • Our CPU time consumption is 50% of Pan’s NTHU-CS

  20. Conclusion and Future Research • First exact algorithm for performance-optimal clustering with retiming under general gate delay model • Twice as fast as Pan’s near-optimal heuristic • Future research is to improve run time efficiency NTHU-CS

  21. NTHU-CS

  22. NTHU-CS

  23. NTHU-CS

  24. Experimental Results NTHU-CS

  25. Experimental Results (Cont’) NTHU-CS

More Related