190 likes | 208 Views
Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapahts. Zhining Huang and Sharad Malik DAC 2002. Outline. Introduction Methodology overview Datapaths for kernel loops Reconfigurable Datapath Benchmark studies Conclusions and futrue work. Introduction.
E N D
Exploiting Operation Level Parallelism through Dynamically Reconfigurable Datapahts Zhining Huang and Sharad Malik DAC 2002
Outline • Introduction • Methodology overview • Datapaths for kernel loops • Reconfigurable Datapath • Benchmark studies • Conclusions and futrue work
Introduction • Programmable platforms • Bit-level programmable • Coarse-grained FPGA • Word-level programmable • Dynamically reconfigurable co-processor • Fixed hardware blocks • Programmable interconnect • Accelerate loops
Methodology overview • Master processor • Reconfigurable co-processor • Reconfigurable datapath • ASIC-like function units • Reconfigurable interconnections • Control logic • State machine • Control datapath execution
Datapaths for kernel loops • Extract kernel loops • Direct mapping of kernel loop datapath • Branch condition transforms • Pipelining the execution • Estimation of the pipeline execution time
Extract kernel loops • Use IMPACT compiler • Profiling • Loop detection (only inner loops) • Data dependence analysis • Register live-in/out • Data dependence between instructions • Within a loop • In different loop
Branch condition transforms • Into different datapaths • Selected by multiplexer
Pipelining the execution • Insert registers in the datapah • Data dependence • Delay or by pass • From registers or memory operations • Four data dependence cases and solutions • Tstore<Tload : delay store • Tstore1<Tstore2 : eliminate store1 • Two loads : do nothing • Tload<Tstore : delay or bypass
Estimation of the pipeline execution time • T=[S+D*(N-1)]+O+W cycles • S: total number of pipeline stages of a datapath • D: delay of the consecutive loop iteration • N: loop iteration number • O: switch overhead • W: write back cycles
Reconfigurable Datapath • Datapath mapping • Routing box • Critical path and clock speed • Reconfiguration overhead
Critical path and clock speed • Trouting box+Tfunction unit+Twire delay • Sophisticated control and function unit • Twice or more cycle for longer timing
Reconfiguration overhead • Overhead • Reconfiguration context switch • Execution switching to the datapath • Execution switching back • Number of loops are selected • 8 or 16 • Control bits are stored in distributed co-processor • Register file bandwidth in master processor
Conclusions and future work • Methodology • Dynamically reconfigurable datapath • For a specific applications • The co-processor can be viewed as a VLIW • Loop restructuring techniques • Reduce data dependencies