110 likes | 121 Views
This study by PASCAL Lab at UC Irvine explores parallelizing the Telemedicine Benchmark for Xbox 360, focusing on efficiency, performance, and methodology. The research delves into the challenges and benefits of parallel programming, compiler usage, data sets, and results analysis. Findings suggest room for improvement and future enhancements to optimize program performance.
E N D
Howard Wong, SURF-IT Fellow Professor Jean-Luc Gaudiot, EECS August 29, 2008 Parallelization of the Telemedicine Benchmark for the Xbox 360 Architecture PASCAL: PArallel Systems and Computer Architecture Lab. University of California, Irvine
PASCAL: PArallel Systems & Computer Architecture Lab. Outline Background (Benchmark, Platform) Current Work Methodology (Compiler, Data Set) Results Conclusions Future Work
PASCAL: PArallel Systems & Computer Architecture Lab. Background Why Parallel Programming? Advent of everyday multicomputers Ultimate goal: Auto-parallelization Basic concepts Problems Programming primitives Telemedicine Benchmark Platform – Xbox 360 3 Cores Graphics Engine Vector Processing Work ? Core 1 Core 2 Core n
PASCAL: PArallel Systems & Computer Architecture Lab. Current Work Goal: Identify the parallelization process Efficiency measured in performance Performance in relation to load POSIX threads (pthreads) and OpenMP Sorting Routines 'fallbackSort' Making search 'brackets' 'mainSort' Dependencies between loop iterations
PASCAL: PArallel Systems & Computer Architecture Lab. Methodology Compilation gcc or g++ version 4.2 Data Sets Monkey brain image in PPM format Derived data via netpbm Test Platform Xbox 360 with Ubuntu Linux Images courtesy of Neuroscience Center, UC Davis, and Joerg Meyer, Center of GRAVITY, Calit2, UC Irvine.
PASCAL: PArallel Systems & Computer Architecture Lab. Initial Results
PASCAL: PArallel Systems & Computer Architecture Lab. Analysis Possible thread contention 'bitmap' of data as former optimization Optimized for long runs of 0's or 1's Extra mutex locks required Thread Creation Sorting algorithm called at least 300 times for the large image Thread creation efficiency Thread management structures
PASCAL: PArallel Systems & Computer Architecture Lab. Results (Cont’d)
PASCAL: PArallel Systems & Computer Architecture Lab. Conclusions & Discussion Speedup dependent on the load size Possible improvements Use a 'threadpool' Create other important compression functions Examine alternative algorithms with a parallel mindset End result Thread creation Thread management overhead Heavy contention
PASCAL: PArallel Systems & Computer Architecture Lab. Questions for Future Work What is the impact of thread creation? Do the other TMB programs have the same features? Can vector instructions improve program performance? Are new, more efficient parallel programming primitives needed for our application?
PASCAL: PArallel Systems & Computer Architecture Lab. Acknowledgments Professor Jean-Luc Gaudiot and the PASCAL group UC Davis Neuroscience Center Professor Joerg Meyer, Center of GRAVITY, Calit2 Calit2 UROP