90 likes | 97 Views
Join our team for exciting summer projects in system optimization at various levels, from general architecture to compiler and OS levels. Experiment with accelerators, processing elements, I/O devices, and more for performance enhancement. Expand your skills and knowledge with diverse tasks such as integrating DSP simulator, building communication facilities, and studying CPU architecture impact on program performance.
E N D
Performance Tuning Team Chia-heng Tu June 30, 2009 summer projects
Optimization levels in General System architecture Design & Source code level Compile level Compiler Library level OS level Bus Architecture level Accelerators (DSPs, FPGA, ASICs, etc) Processing Elements (ARM, PPC, etc) I/O Devices (UART, USB, LCD, etc) Accelerators (DSPs, FPGA, ASICs, etc) Processing Elements (ARM, PPC, etc)
Performance Evaluation of the CUDA programs on Muticore platforms • (Compiler, Architecture, Parallel Computing, Performance Tools) • Establishing Heterogeneous Multicore Environment (QEMU) • (System software) Integrate an existing DSP simulator • (System software) Communication facility (MSG library) on the environment • (Compiler) Write a DSP emulator • Performance Analysis Infrastructure (QEMU) • (System software) Port PAPI onto QEMU (arm processor) • (Architecture) Add Hardware Performance Monitoring Events • (Performance tool) Tracing tool library porting on QEMU • Embedded Development Platform (TI Davinci) • Port Tracing tool onto TI Davinci platform • Port MSG Library onto TI Davinci platform • Integrate PAPI onto TI Davinci platform • Study of the Impact of CPU Architecture on Program Performance • (Architecture, performance tools) Memory opportunity List of summer projects MOEA Project
Performance Evaluation of the CDUA programs on Multicore platforms • Programming model vs. CPU architectures Binaries (PPE+SPE) Real Apps. (Parallel C program) Real Apps. (written in CUDA program model) Application Layer Cell compiler Code translator OS Layer Red Hat or Fedora 9 Linux Platform Layer vs. Intel Nehalem Architecture IBM Cell Broadband Engine Architecture
Integrate an existing DSP simulator • Build communication facility (MSG library) on the environment • Write a DSP emulator Establishing Heterogeneous Multicore Environment ARM Binary DSP Binary Application Layer Real Apps. (Crypto, Multimedia, etc) High-level Communication Interface Library Layer Communication Library OS Layer OS (Linux) Bus I2C Bus Platform Layer (Virtual Platform) ARM Memory Accelerator (PAC DSP) QEMU
Port Tracing tool library on QEMU (performance tools) • Freq and time of function calls, and call graph • Integrate Performance Application Programming Interface (PAPI) • Add Hardware Performance Events Performance Analysis Infrastructure Real Apps. (Crypto, Multimedia, etc) 1 Application Layer Source code instrumentor PMU_dijkstra.c 2 int main(int argc,char **argv) { // The data structure recording the performance data struct perfctr_sum_ctrs before, after; // … prolog: setup the environment. read_PMU(&before); dijkstra.c(); read_PMU(&after); //… Epilog: dump the performance data (Instruction Counts) return 0; } High-level Performance Analysis Interface Library Layer Tracing lib. Performance Application Programming Interface Library Perfctr (PMU Driver) OS Layer OS (Linux) 3 Bus I2C Bus cache miss rate, etc Accelerator (PAC DSP ISS) ARM Logical Time Stamp Counter (TSC) Performance Monitoring Unit (PMU) Platform Layer (Virtual Platform) Timing Facility QEMU
Port MSG Library onto TI Davinci platform • Port Tracing tool onto TI Davinci platform • Integrate PAPI onto TI Davinci platform Embedded Development Platform (TI Davinci) Real Apps. (Crypto, Multimedia, etc) Application Layer 2 3 1 Call graph High-level Communication Interface High-level Performance Analysis Interface A Library Layer Communication Library Tracing lib. Performance Application Programming Interface Library OS (Micro-kernel) OS (Micro-kernel) B C OS Layer Bus I2C Bus C64x DSP ARM D E F Performance Monitoring Unit (PMU) Platform Layer (Virtual Platform) TI-Davinci
Performance comparison of parallel programs on different multicore architectures • Impact factors: cache size, cache hierarchy, interconnection among cores, etc Study of the impact of CPU Architecture on Program Performance Real Apps. (Parallel C Programs) Application Layer OS Layer Linux Platform Layer IBM Cell Broadband Engine Architecture Intel Nehalem Architecture AMD Quad Core Architecture
Practical, system wide, and up to date research projects Everyone is Welcome to join us!!!