Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions

Design-Space Exploration of Resource-SharingSolutions for Custom Instruction Set Extensions IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 12, DECEMBER 2009 Marcela Zuluaga and Nigel Topham 2010/05/20 Presenter : 陳俊霖

Outline • Introduction • Merging two DGFs • Parametric Resource-Sharing Heuristic • A. Multioperation Vertices • B. Controlling the Area-Latency Tradeoff • C. Controlling the Execution-Time Impact of Merging • D. Vertex Grouping • Results

Introduction Resource sharing can reduce the die area and energy consumption of a customized processor. But ISE latencies may increase after merging.

Introduction • Assumption: • ISEs have been identified by a previous compiler phase. • ISEs represent as a collection of directed acyclic graphs (DAGs) annotated with execution frequency. • All inputs arrive together and that all outputs leave together. • Problem: • How to merge such a collection of graphs to reduce the overall die area while minimizing the increase in execution latency.

Merging two DFGs Global Common Strings : G1(2,4) G2(0,2), G1(3) G2(4), G1(1) G2(1) Local Common String : G’(1)G’(5), G’(3)G’(7) Local Common String : G’(3)G’(7) MaxStrLocal : G’(3)G’(7) MaxStrGlobal : G1(2,4) G2(0,2) MaxStrLocal : G’(1)G’(5)

Parametric Resource-Sharing Heuristic Resource sharing is induced by the search for maximum-area common substrings between two paths belonging to different graphs. Area reduction is maximized by the expected area saved rather than by simply considering the substring length.

Parametric Resource-Sharing Heuristic • Five parameters are put on the algorithm to find many alternative solutions. • αT, βT,θT : threshold parameters, real values in range[0,1]. • To limit the increase in the ISE execution delay in relation to the area saved by merging operators • MultiOp: binary value. • To control the creation of multioperation vertices from similar operators. • Grouping : binary value. • To determine whether certain operator groupings will be recognized and exploited during the merging process.

A. Multioperation Vertices • Vertices that perform similar but different operations could be merged with a small overhead. • The creation of muotioperation vertices is governed by the parameter MultiOp. • Area saving of multioperation vertex • Area(x,y) = Ax + Ay - Axy

B. Controlling the Area-Latency Tradeoff • The heuristic must decide whether the increased function unit latency resulting from the merge is sufficiently offset by the area savings to make the merge beneficial. • θ is introduced to quantify the area-latency tradeoff. • θk∈{x, y} = • First term: the relative decrease in latency perceived by not performing the merge. • Second term: the area savings that do result from merging. • If θx θy exceed the threshold θT, the G’ is discarded from Gout and store in the set S* then restart merging. (G’ : merged graph)

B. Controlling the Area-Latency Tradeoff θ1 and θ2 < θT Common substrings: G1(1)G3(3) : area=6 G1(3)G3(2) : area=5 forbidden θ1andθ2 < θT Calculate θ1θ2 , if θ1 or θ2 > θT Discard and restart If θ1 orθ2 > θT S*={G1(1)G3(3)} S*={G1(0)G2(3)}

C. Controlling the Execution-Time Impact of Merging • Although θT prevents from a poor tradeoff between area savings and increased latency, it’s not sufficient. • If G1 is a frequently executed graph, then the resulting Gout is not a good solution.

C. Controlling the Execution-Time Impact of Merging • αi for Gi is to counteract this effect • Fi: the normalized execution frequency of Gi. • Mi: the precentage of area of possibly merged operations in Gi. • Each αi is compared with the threshold αT before merging. If αi exceeds αT, Gi is excluded from the set of input graphs. • The effect of αT is to leave Gi unmerged if the merging process would increase its latency beyond an acceptable threshold.

C. Controlling the Execution-Time Impact of Merging • Another case • Another metric n is the number of input graphs

D. Vertex Grouping • Certain operator sequences can be combined as an atomic unit during logic synthesis to yield smaller and faster solutions than their individual components. • Grouping controls whether operator groups should be identified and retained instead of trying to merge each operator independently.

Results • The specific effect of varying αTβT andθT. • MultiOp = 0 and Grouping = 0 • AsθT is reduced, the resulting solutions are pushed to the left.

Results (x’s)MultiOp=0 Grouping=0 (circle)MultiOp=1 Grouping=0 (squares)MultiOp=0 Grouping=1 (crosses)MultiOp=1 Grouping=1

Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions

Design-Space Exploration of Resource-Sharing Solutions for Custom Instruction Set Extensions

Presentation Transcript

Planning for Space Exploration

Instruction Set Design

Design Space Exploration of Embedded Systems

Towards An Early Design Space Exploration Tool Set for STT-RAM Design

Instruction Set Extensions for Computation on Complex Floating Point Numbers

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

Fast, Quasi-Optimal, and Pipelined Instruction-Set Extensions

Design Space Exploration with SimpleScalar

Summative for Space Exploration

Motivation for the Design of an Instruction Set

Automatically Generating Custom Instruction Set Extensions

Architectural Design Space Exploration

Design Space Exploration

INSTRUCTION SET DESIGN

Instruction Set Design

Design Space Exploration

Instruction Set Extensions for Multi-Threading in LEON3

Instruction Set Design

Design FSM, Instruction Set Architecture

Fast, Quasi-Optimal, and Pipelined Instruction-Set Extensions

A Design Space Exploration framework for rISA Design

Custom Software Development - Design Extensions