320 likes | 335 Views
This paper presents a new bipartitioning algorithm that aims to minimize both cutsize and path delay in performance-driven partitioning. The algorithm incorporates the concept of locally unidirectional partitioning to reduce implementation overheads. Experimental results demonstrate the effectiveness of the proposed approach.
E N D
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported in part by MARCO GSRC
Outline Motivation • Performance driven bipartition problem • New bipartitioning algorithm • Experimental results • Conclusion and future work
Partitioning and Performance The hypergraph partitioning problem is to divide the nodes of a hypergraph into roughly equal parts; the traditional objective is to minimize cutsize. In performance-driven partitioning, we also seek to minimize path delay on timing paths.
Previous Work (I) • [Cong et al. ISPD-2002] • Global clustering based algorithm with retiming Min-delay Clustering w/ retiming Min-cutsize Clustering De-clustering and refinement • Reduces delay by 16% while increasing cutsize by 17% • Requires substantial gate replication
Previous Work (II) • [Ababei et al. ICCAD-2002] • Reweighting based method Path based Input Reweighting Cutsize oriented partitioner, such as hMetis,MLPart 1 1 Global timing analysis Find critical paths 1 Net based 1 2 1 • 14% reduction of delay with 10% increase in cutsize • 139% increase in runtime compared with hMetis
Motivating Questions Can we avoid global timing analysis? • Global timing analysis is extremely time-consuming • Can we improve path delay without significant degrading of cutsize? • Need smooth tradeoff between delay and cutsize • Can we reduce implementation overheads? • Previous methods store thousands of critical paths and continuously update them
Outline • Motivation Performance driven bipartition problem • New bipartitioning algorithm • Experimental results • Conclusion and future work
Delay Model Delay = hop_delay + node_delay hop Part 1 Part 0 FF nodes Combinational nodes cut [Cong et al. ISPD-2002] hop_delay=5 node_delay=1 Delay = 3x5 + 5x1 = 20 [Ababei et al. ICCAD-2002] hop_delay=Elmore delay node_delay=constant
Performance Driven Bipartition Problem • Given: • Hypergraph H=(V,E) • Area Balance tolerance s (0<s<1), a parameter to control allowable slack in the area constraint • a, a given parameter which captures tradeoff between cutsize and path delay (hopcount) • Find: • A bipartition (V0|V1) which satisfies: • and minimizes a(cutsize)+(1-a)(Max_hopcount)
Outline • Motivation • Performance driven bipartition problem New bipartitioning algorithm • Experimental results • Conclusion and future work
Unidirectional Partition Path delay is minimized with hopcount = 1 if the partition is unidirectional (“acyclic”), that is, all cuts are in the same direction Part 1 Part 0 Part 0 Problem: • High cutsize • No unidirectional solution Can we achieve “locally unidirectional” partition? Max hopcount=5 Max hopcount=3 Part 0 Part 0 Part 1 Part 1
V-Shaped Nodes V-shaped node If a combinational node vsatisfies: there exist vj, vt in the other part and a path from vj to vt that includes only v thenv is a V-shaped node vj vt Part 0 Part 1 v
V-Shaped Nodes in Critical Paths Empirical observations from study of partitioning solutions: • there are V-shaped nodes in the partitioning solutions • every V-shaped node is included in many critical paths • every critical path contains several V-shaped nodes For testcase 1: • Number of nets : 16377 • Number of critical paths : 26772 • On average, one critical path contains 27.6 nodes • On average, one critical path contains 3.4 V-nodes • On average, one V-node belongs to 233.7critical paths
Key Idea: V-Shaped Nodes Elimination Part 0 f Part 0 f c a c a Move b b Part 1 b d e d e Part 1 Move V-shaped node “b” to reduce path hopcount PATH: abc hopcount=0 PATH: dbc hopcount=1 PATH: ebc hopcount=1 PATH: abc hopcount=2 PATH: dbc hopcount=1 PATH: ebc hopcount=1
Distance-k V-Shaped Nodes Elimination Part 0 Part 0 d a d a b c Move b,c Part 1 b c Part 1 k = 2: Move V2 node “b, c” reduce path hopcount from 2 to 0 Problems with large k: Cutsize may be greatly increased Delay of one path reduced while other paths delay increased
New Gain Function v v After Move Before Move Gain(v)=δ(0)+ δ(1) g(v): traditional FM gain rj(v): reduction of Vj nodes after moving v
Distance-k Unidirectional Algorithm Calculate initial gains for all nodes and store the gains Select the node v with maximum gain /* CLIP-like method: move the cluster that v belongs to */ Reset the gains of all nodes to zero Move v and update the gains of v and its neighbors While ( one node not moved) Select one node v with the maximum updated gain Move v and update the related gains Find the point in the move sequence at which the sum of gains is maximum; undo all moves after this point
Outline • Motivation • New bipartitioning algorithm • Experimental results • Conclusion and future work
Experimental Setup • Four industry testcases obtained as LEF/DEF • Model of Ababei et al. (ICCAD-2002) used to calculate delay • Partitioning solutions compared to results of MLPart • strongest multilevel netlist partitioning code • website: http://nexus6.cs.ucla.edu/GSRC/bookshelf/Slots/Partitioning/MLPart • All tests on 600MHz Intel Pentium-III Xeon
Biasing against V1 Nodes vs. MLPart δ(0)=1, δ(1)=10 • Reduction of delay: 4.5%-24.4%average:15.1% • Increase of cutsize:3.0%-10.0%average:4.9% • Increase of runtime:6.3%-11.4%average: 9.7% Using the delay model in Cong et al. ISPD -2002 • Reduction of delay: 4.3%-21.2%average:14.7%
Biasing against V2 Nodes vs. MLPart δ(0)=1, δ(1)=30, δ(2)=3 • Reduction of delay: 8.9%-30.0%average: 18.7% • Increase of cutsize:3.1%-7.2%average: 3.5% • Increase of runtime:11.9%-15.9%average: 13.1% Using the delay model in Cong et al. ISPD -2002 • Reduction of delay: 8.3%-28.7%average: 17.3%
Outline • Motivation • Performance driven bipartition problem • New bipartitioning algorithm • Experimental results Conclusions and future work
Conclusions • Simple yet efficient timing-driven partitioning that does not require global timing analysis • Negligible implementation, runtime overhead • Significantly reduces path delay with cutsize and runtime almost same as leading-edge MLPart • Similar improvements observed with different path delay metrics • Futures • Impact of new partitioner on placement • Efficient methods for biasing δ(k) k>2
Future Work • Impact of new partitioner on placement • Efficient methods for biasing δ(k) k>2
Why Performance Driven Partitioning? • Achieving timing closure becomes increasingly difficult in deep-submicron technologies due to non-ideal scaling of interconnect delay • Routing alone can no longer solve timing problem, even with aggressive optimizations (buffer insertion, buffer/wire sizing,…) Timing needs to be addressed at all design stages • Partitioning is a critical step in defining interconnect timing properties, but is traditionally driven by cutsize objective
Previous Work (I) • With Logic Replication • Retiming • Replication graph • Without Logic Replication • Net based reweighting • Path based reweighting
Part 0 Part 1 FM Partitioning and Gain Function Start with random partition v v Move the node with the max gain and lock it Part 0 After Move Before Move Part 1 Gain(v)=-1 Gain(v) = Reduction of cutsize after moving v Keep moving until all nodes are locked Find the best point in the move sequence Part 1 Part 0 Part 0 Part 1
Procedure to Calculate rj(v) Delete all FF nodes and their related edges In the remaining graph, BFS from v Foreach level j from 1 to k If v is a Vj node before moving, rj’=1 If v is a Vj node after moving, rj’’=1 rj=rj’’-rj’
v CLIP v CLIP Algorithm Reminiscent of CLIP (Deng et al. DAC 1996) in how it induces movement of clusters across the cutline.
Distance-k V-Shaped Nodes Distance-k V-shaped nodes (Vk-node): If k combinational nodes vi,1 … vi,k satisfy: vi,1 … vi,k are in the same part vj, vt in the other part a path from vj to vt and only passes vi,1 … vi,k thenvi,1 … vi,k are distance-k V-shaped nodes vj vt Part 0 Part 1 vi,1 vi,k
Notation • H(V,E)= circuit hypergraph • V = set of nodes representing components of the circuit • E = set of signal nets • A bipartition (V0|V1) of H(V,E) divides V into two disjoint subsets s.t. V= V0V1, which are called Part 0 and Part 1 • A= the total area of all the nodes in V • A0= the area of all the nodes in V0