130 likes | 283 Views
A Graph Based Algorithm for Data Path Optimization in Custom Processors. J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems University of California, Irvine. Outline. Introduction Design Methodology Initial Allocation Architecture Wizard
E N D
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems University of California, Irvine
Outline • Introduction • Design Methodology • Initial Allocation • Architecture Wizard • Critical Path Extraction • Spill Algorithm • Results • Conclusion
Introduction (1of 2) • Complexity of SoC rising • Short time to market • Need for processors specialized for different application domains • General purpose processors • Often slow and power hungry • Full HW design • Expensive and rigid for debugging and feature extension • Custom processor • Adapt the data path to a given application • Need for automatic generation of application specific architectures
Introduction (2 of 2) • Previous work in High Level Synthesis • Integer linear programming [Landwehr et al.] • Force driven scheduling [Paulin and Knight] • Finding minimal cliques [Tseng and Seiwiorek] • Branch-and-bound [Marwedel] • Proposed methodology separates the allocation from scheduling and binding
Design Methodology • Define application’s maximum requirements • ALAP schedule • Initial Allocation chooses from Component DB (CDB) • Select as many units as needed for ALAP • Architecture Wizard (AW) analyzes component utilization • Based on the schedule and profiling data • Optimized Architecture • Using the design constraints
Initial Allocation and Component Selection • Define max requirement • Based on the statistics for operators and data transfer • Finding “the best fit” in CDB for given requirements • Storage (RF and Memory) • Min difference in number of ports • Functional units: • The most general unit executing given operation • Buses: • Source buses: • N, if N is even • (N+1), if N is odd • Where N = # RF output ports • Destination buses = #RF in ports
Architecture Wizard - Overview • Goal of Phase II • Reducing number of used resources • Under performance and utilization constraints • Inputs: • Schedule for the Max Configuration • Execution frequencies (Profiler) • Utilization and performance constraints (Designer) • Component Data Base (CDB) • Outputs: • Architecture Net-List • Report
Architecture Wizard: Tool Flow • Histograms for • A functional unit type • Group of in/out ports of a storage unit • For the basic blocks (BB) in the critical path, for each histogram • Vary number of units • Estimate execution and utilization • Allocate data path • when constraints satisfied • Use the same heuristics as for the initial allocation
1 2 3 Critical Path Extraction • Critical Path: • A sequence of BB from start to end that contributes the most to the execution time • Start with the graph of the application • Create direct acyclic graph • Create dual graph • edge ex, create a node Ex • node By, create (input X output) # of edges • Transform to the shortest path problem • Compute weights as 1/wi or Wmax-wi • Find the shortest path
“Spill” - Flattening Algorithm • Utilization profile for each • FU type and in/out port of storage unit • Type and number of instances of other components is unchanged • For chosen number of FUs • Estimate extra cycles (Δ) by postponing operations into empty slots • Maximize component utilization • Utilization = ΣUsed FUs / (choden# * Exec. Time) • Compute global Δ and utilization • Per block estimation • Execution frequencies
Results • Application: bdist2 (MPEG2 encoder), OnesCounter, Sort (bubble sort), dct32 (MP3) • Δ= 20%, Utilization = 75%
Conclusion • Automatic generation of data path • Separate allocation from scheduling and binding • Initial Allocation – creates dense architecture • Architecture Wizard – refines architecture for given constraints • Future work and issues • Reduce area • Reduce complexity of FU • Further reduce interconnect • Features • Pipelining, chaining, forwarding, special function units