710 likes | 720 Views
Explore area-efficient instruction set synthesis techniques for reconfigurable system-on-chip designs, highlighting algorithm description, resource sharing, and experimental results.
E N D
Area-Efficient Instruction Set Synthesis for Reconfigurable System on Chip Designs Philip Brisk Adam Kaplan Majid Sarrafzadeh philip@cs.ucla.edu kaplan@cs.ucla.edu majid@cs.ucla.edu Embedded and Reconfigurable Systems Lab Computer Science Department University of California, Los Angeles DAC ’04. June 9, 2004. San Diego Convention Center, San Diego, CA
Outline • Custom Instruction Generation and Selection • Resource Sharing • Algorithm Description with Examples • Datapath Synthesis Techniques • Experimental Methodology and Results • Summary
Custom Instruction Generation and Selection • Custom Instruction Generation • Compiler Profiles Application Code • Extracts Favorable IR Patterns • Synthesizes Patterns as Hardware Datapaths • Custom Instruction Selection • Area Constraints Limit on-Chip Functionality • NP-Hard 0-1 Knapsack Problem • Formulated as an Integer Linear Program (ILP)
ILP Formulation for Instruction Selection Problem For each custom instruction i Gain(i) : Estimated Performance Gain of i Area(i) : Estimated Area of i Selected(i) : 1 if i is Selected; 0 Otherwise Goal Maximize Gain of Selected Instructions Constraint Area of Selected Instructions FPGA Area <
What About Resource Sharing? My Datapath Two DFGs Area Costs 8 Area = 17 5 3 1 Area = 25 1.5 ILP Area Estimate = 42 Area = 28
Analysis • 0-1 Knapsack Problem Formulation Over-Estimated Area by 150% • ILP Solvers Do Not Consider Resource Sharing • How to Remedy This • Develop a Resource Sharing Algorithm • Avoid Additive Area Estimates Based on per-Instruction Costs
Resource Sharing for DFGs • Given: • A Set of DFGs G* = {G1, …, Gn} • Goal: • Construct a Consolidation Graph GC of Minimal Cost • Constraints: • GC Must be Acyclic • GC Must be a Supergraph of each Gi in G* • That’s Life: • The Problem is NP-Hard
Resource Sharing Overview • Decompose Patterns into Input-Output Paths • Path Based Resource Sharing (PBRS) G3 G1 G2 G4
Resource Sharing Overview • Decompose Patterns into Input-Output Paths • Path Based Resource Sharing (PBRS) G3 G1 G2 G4
Resource Sharing Overview • Use Substring Matching to Share Resources • Merge DFGs Along Matched Nodes G3 G1 G2 G4
Resource Sharing Overview • Synthesize GC • Requires Less Area than Synthesizing G1…G4 Separately Gc G1 G2 G3 G4
Path-Based Resource Sharing P1: ( ) P2: ( ) Area Costs 8 5 3 1
Maximum Area Common Substring P1: ( ) P2: ( ) MACStr O(L) L – Length of String ( ) Area of MACStr = 26 Area Costs 8 5 3 1
Maximum Area Common Subsequence P1: ( ) P2: ( ) MACSeq O(L2/logL) L – Length of String ( ) Area of MACSeq = 43 Area Costs 8 5 3 1
Resource Sharing Algorithm Global Phase Determine: Which DFGs to Merge An Initial Path to Merge Local Phase Aggressively Apply PBRS to Share Resources Between the DFGs Selected by the Global Phase Repeat Until all DFGs are Merged, or no Further Resource Sharing is Possible
Resource Sharing Algorithm Area Costs G3 G1 G2 G4 8 5 3 1
Global Phase Area Costs G3 G1 G2 G4 8 5 3 1
Global Phase Area Costs G3 G1 G2 G4 8 5 3 1 MACSeq/MACStr
Entering Local Phase Area Costs G1 G2 8 5 3 1 MACSeq/MACStr
Local Phase 1 2 2 Area Costs G12 2 2 G1 G2 2 8 5 3 1 MACSeq/MACStr
Local Phase 1 2 2 Area Costs G12 2 2 G1 G2 2 8 5 3 1 0 0 0 0 MACSeq/MACStr
Local Phase 1 2 2 Area Costs G12 2 2 G1 G2 2 8 5 3 1 0 0 0 0
Local Phase 1 2 2 Area Costs G12 2 2 G1 G2 2 8 5 3 1 0 0 0 0 MACSeq/MACStr
Local Phase 1 2 2 Area Costs G12 2 2 G1 G2 2 8 5 3 1 0 0 0 0 MACSeq/MACStr
Local Phase 2 2 Area Costs G12 2 G1 G2 2 8 5 3 1 0 0 0 0 MACSeq/MACStr
Local Phase 2 2 Area Costs G12 2 G1 G2 2 8 5 3 1 0 0 0 0 MACSeq/MACStr
Local Phase 2 2 Area Costs G12 2 G1 G2 2 8 5 3 1 0 0 0 0 MACSeq/MACStr
Local Phase 2 2 Area Costs G12 2 G1 G2 2 8 5 3 1 0 0 0 0
Returning To Global Phase G12 Area Costs G3 G4 8 5 3 1
Global Phase G12 Area Costs G3 G4 8 5 3 1
Global Phase G12 Area Costs G3 G4 8 5 3 1 MACSeq/MACStr
Entering Local Phase G12 Area Costs G4 8 5 3 1 MACSeq/MACStr
Local Phase 12 12 G12 Area Costs G124 12 12 12 G4 8 12 4 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
Local Phase 12 12 G12 Area Costs G124 12 12 12 G4 8 12 4 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
Local Phase 12 12 G12 Area Costs G124 12 12 12 G4 8 12 4 5 4 3 4 1 0 0 0 0
Local Phase 12 12 G12 Area Costs G124 12 12 12 G4 8 12 4 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
Local Phase 12 12 G12 Area Costs G124 12 12 12 G4 8 12 4 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
A Local Decision 12 12 G12 Area Costs G124 12 12 G4 8 12 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
A Local Decision 12 12 G12 Area Costs G124 12 12 G4 8 12 5 4 3 4 1 0 0 0 0
A Local Decision 12 12 G12 Area Costs G124 12 12 G4 8 12 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
A Local Decision 12 12 G12 Area Costs G124 12 12 G4 8 12 5 4 3 4 1 0 0 0 0 MACSeq/MACStr
Cycles are Illegal ILLEGAL! 12 12 Area Costs G124 12 12 G124 12 12 12 8 12 12 5 4 4 3 4 1 0 0 0 0 MACSeq/MACStr
Cycles are Illegal LEGAL! 12 12 Area Costs G124 12 12 12 G124 12 12 12 8 12 5 4 4 3 4 1 0 0 0 0 MACSeq/MACStr
Local Phase G12 12 12 Area Costs G124 12 12 G4 8 5 4 3 1 0 0 0 0
Returning To Global Phase G124 Area Costs G3 8 5 3 1
Global Phase G124 Area Costs G3 8 5 3 1
Global Phase G124 Area Costs G3 8 5 3 1 MACSeq/MACStr
Global Phase 3 124 3 124 G1234 G124 Area Costs 124 124 3 G3 124 8 5 124 3 1 MACSeq/MACStr
Global Phase 3 124 3 124 G1234 G124 Area Costs 124 124 3 G3 124 8 5 124 3 1 MACSeq/MACStr 0 0 0 0
Local Phase 3 124 3 124 G1234 G124 Area Costs 124 124 3 G3 124 8 5 124 3 1 0 0 0 0