170 likes | 229 Views
Aristotle University of Thessaloniki. The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms. N. Vassiliadis, A. Chormoviti, N. Kavvadias, S. Nikolaidis Section of Electronics and Computers, Department of Physics,
E N D
Aristotle University of Thessaloniki The Effect of Data-Reuse Transformations on Multimedia Applications for Different Processing Platforms N. Vassiliadis, A. Chormoviti, N. Kavvadias, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece nivas@physics.auth.gr
Aristotle University of Thessaloniki Scope • A methodology for the implementation of an ASIP, from a hardware-software perspective, is followed • An ASIP for multimedia applications is designed • The effect of data-reuse transformations, in terms of energy and performance, on a multimedia application executed on ASIP and GPP is studied
Aristotle University of Thessaloniki Motive • Popularity of portable multimedia applications • Great need for power optimization strategies, especially in higher design levels • Code transformations aiming at a memory hierarchy provide significant power savings • While ASICs lack flexibility and GPP are prohibitively expensivein terms of energy-performance • the embedded systems industry has an increasing interest in ASIPs
Aristotle University of Thessaloniki Data Reuse Transformations • In data-dominated applications significant power savings can be achieved by developing a custom memory organization • The two dimensional Three-Step Search(TSS) algorithm is used as benchmark • The custom memory organization for this benchmark is designed
Aristotle University of Thessaloniki ASIP Design Flow-Architecture Template • A RISC, MIPS-like machine is used as the base processor • Target is extensions on the instruction set of the processor, beneficial in terms of performance and power consumption.
Aristotle University of Thessaloniki ASIP Design Flow-Front-End Compilation • The TSS algorithms with the different data reuse transformations described in C programming language were compiled • The GNU-GCC for embedded architectures, configured as a cross-compiler for the MIPS architecture, was used
Aristotle University of Thessaloniki ASIP Design Flow-Dynamic Profiling • Dynamic profiling with the GNU tools (gcc, binutils, gdb) configured for the MIPS processor was performed • Heavily executed portions of the code was identified • Loop iteration overhead is 24% of the total execution cycles • Addressing generation instructions are 62% of the total execution cycles • Only 14% of the execution time is consumed on pure computational micro-operations • New candidate instructions from which the application can benefit,revealed
Aristotle University of Thessaloniki ASIP Design Flow-Instruction Set Extensions • “Increment and Branch” instruction to reduce Loop iteration overhead • Store/Load Word with addition for address calculation (one cycle using pipeline) • Direct support of the custom memory hierarchy
Aristotle University of Thessaloniki ASIP Design Flow-Code Re-Generation • Original code is parsed and the MOPs are reordered to construct the instruction extensions-patterns • Patterns are substituted by the new defined instructions • MOPs are reordered to keep the pipeline as full as possibly
Aristotle University of Thessaloniki ASIP Design Flow-Cycle Accurate and Hardware Models • A Cycle Accurate simulation model, in the SystemC language, was constructed • A hardware model in VHDL language was designed • Execution frequency of instructions and access to crucial hardware components was collected • Specifications were determined by synthesis on a popular standard cell technology
Aristotle University of Thessaloniki Experimental Results • The different versions of the TSS application codewere compiled for the AMR9TDMI and the ASIP cores • Cycle accurate simulations were performed on the ARMulator and the SystemC simulator respectively • The TSS was executed on digital pictures of MxN=144x176 pixels. The block size B was set to 16 while the search window size [-p,p] was set to [-7,7].
Aristotle University of Thessaloniki Performance Results • Performance gain of 29%(ARM9TDMI) and 54%(ASIP) for P4 compared to the original TSS • ASIP is capable to deliver 54% performance gain compared to ARM9TDMI core. • ASIP delivers 250Mhz performance with STM 0.18um technology • ARM9TDMI implemented in the same technology process delivers 200MHz
Aristotle University of Thessaloniki Energy Results • SRAM memories with appropriate size were used for each layer of the data memory • ROM memory was used for the instruction memory. • Because of the 50% smaller code size, compared to ARM, that ASIP provides, ROM instruction memory was used with sizes 4KB and 2KB respectively.
Aristotle University of Thessaloniki Energy Results • Energy consumption is dominated by the energy consumption due to access on the instruction memory • P4 delivers 32%(ARM) and 54%(ASIP) energy savings compared with the original TSS. • 42% energy savings can be achieved by using the P4 transformation on the ASIP compared to ARM9TDMI • These energy savings result from the smaller number of access to the Instruction Memory but also due to the smaller Instruction Memory size of ASIP.
Aristotle University of Thessaloniki Conclusions • Both solutions, namely ASIP and GPP, can benefit in terms of performance and energy consumption by selecting the appropriate custom data memory hierarchy. • ASIP can achieve highest performance and energy reduction through this hierarchy, compared to a GPP.
Aristotle University of Thessaloniki Performance Results
Aristotle University of Thessaloniki Energy Results