160 likes | 302 Views
Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 12, DECEMBER 2009 Marcela Zuluaga and Nigel Topham 2010/05/20 Presenter : 陳俊霖. Outline. Introduction
E N D
Design-Space Exploration of Resource-SharingSolutions for Custom Instruction Set Extensions IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 12, DECEMBER 2009 Marcela Zuluaga and Nigel Topham 2010/05/20 Presenter : 陳俊霖
Outline • Introduction • Merging two DGFs • Parametric Resource-Sharing Heuristic • A. Multioperation Vertices • B. Controlling the Area-Latency Tradeoff • C. Controlling the Execution-Time Impact of Merging • D. Vertex Grouping • Results
Introduction Resource sharing can reduce the die area and energy consumption of a customized processor. But ISE latencies may increase after merging.
Introduction • Assumption: • ISEs have been identified by a previous compiler phase. • ISEs represent as a collection of directed acyclic graphs (DAGs) annotated with execution frequency. • All inputs arrive together and that all outputs leave together. • Problem: • How to merge such a collection of graphs to reduce the overall die area while minimizing the increase in execution latency.
Merging two DFGs Global Common Strings : G1(2,4) G2(0,2), G1(3) G2(4), G1(1) G2(1) Local Common String : G’(1)G’(5), G’(3)G’(7) Local Common String : G’(3)G’(7) MaxStrLocal : G’(3)G’(7) MaxStrGlobal : G1(2,4) G2(0,2) MaxStrLocal : G’(1)G’(5)
Parametric Resource-Sharing Heuristic Resource sharing is induced by the search for maximum-area common substrings between two paths belonging to different graphs. Area reduction is maximized by the expected area saved rather than by simply considering the substring length.
Parametric Resource-Sharing Heuristic • Five parameters are put on the algorithm to find many alternative solutions. • αT, βT,θT : threshold parameters, real values in range[0,1]. • To limit the increase in the ISE execution delay in relation to the area saved by merging operators • MultiOp: binary value. • To control the creation of multioperation vertices from similar operators. • Grouping : binary value. • To determine whether certain operator groupings will be recognized and exploited during the merging process.
A. Multioperation Vertices • Vertices that perform similar but different operations could be merged with a small overhead. • The creation of muotioperation vertices is governed by the parameter MultiOp. • Area saving of multioperation vertex • Area(x,y) = Ax + Ay - Axy
B. Controlling the Area-Latency Tradeoff • The heuristic must decide whether the increased function unit latency resulting from the merge is sufficiently offset by the area savings to make the merge beneficial. • θ is introduced to quantify the area-latency tradeoff. • θk∈{x, y} = • First term: the relative decrease in latency perceived by not performing the merge. • Second term: the area savings that do result from merging. • If θx θy exceed the threshold θT, the G’ is discarded from Gout and store in the set S* then restart merging. (G’ : merged graph)
B. Controlling the Area-Latency Tradeoff θ1 and θ2 < θT Common substrings: G1(1)G3(3) : area=6 G1(3)G3(2) : area=5 forbidden θ1andθ2 < θT Calculate θ1θ2 , if θ1 or θ2 > θT Discard and restart If θ1 orθ2 > θT S*={G1(1)G3(3)} S*={G1(0)G2(3)}
C. Controlling the Execution-Time Impact of Merging • Although θT prevents from a poor tradeoff between area savings and increased latency, it’s not sufficient. • If G1 is a frequently executed graph, then the resulting Gout is not a good solution.
C. Controlling the Execution-Time Impact of Merging • αi for Gi is to counteract this effect • Fi: the normalized execution frequency of Gi. • Mi: the precentage of area of possibly merged operations in Gi. • Each αi is compared with the threshold αT before merging. If αi exceeds αT, Gi is excluded from the set of input graphs. • The effect of αT is to leave Gi unmerged if the merging process would increase its latency beyond an acceptable threshold.
C. Controlling the Execution-Time Impact of Merging • Another case • Another metric n is the number of input graphs
D. Vertex Grouping • Certain operator sequences can be combined as an atomic unit during logic synthesis to yield smaller and faster solutions than their individual components. • Grouping controls whether operator groups should be identified and retained instead of trying to merge each operator independently.
Results • The specific effect of varying αTβT andθT. • MultiOp = 0 and Grouping = 0 • AsθT is reduced, the resulting solutions are pushed to the left.
Results (x’s)MultiOp=0 Grouping=0 (circle)MultiOp=1 Grouping=0 (squares)MultiOp=0 Grouping=1 (crosses)MultiOp=1 Grouping=1