160 likes | 307 Views
Re-engineering Applications for Data-Parallel Hardware. Opportunities and Challenges. Santonu Sarkar SETLabs, Infosys. The Chemistry of Concurrent and Distributed Programming Mysore Park Workshop, Mysore, India. Feb 16-19, 2011. Distributed Computing. Paradigm Shift in Computing.
E N D
Re-engineering Applications for Data-Parallel Hardware Opportunities and Challenges Santonu Sarkar SETLabs, Infosys The Chemistry of Concurrent and Distributed Programming Mysore Park Workshop, Mysore, India. Feb 16-19, 2011
Distributed Computing Paradigm Shift in Computing
GPGPU as Computing Platform • General Purpose computing on Graphics Processing Units • Using GPUs for computation intensive, non-graphical applications • Why GPU Computing? • GPUs are faster, programmable, easily available and cheap • Change in Computing Paradigm • Traditional super-scalar architectures have their limits for intensive workloads • Parallel computing becoming a common-place Cannot be automatically leveraged
Desktop “Super”computing Linpack benchmark CPU Server: 2x Intel Xeon X5550, 2.66 GHz, 48GB Memory, $7K, 0.55KW CPU-GPU Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48GB Memory, $11K, 1KW http://www.vpac.org/files/GPU-Slides/01.tesla_introduction.pdf 37 TFlop Top 150 System
New platform offers HIGH cost-performance ratio, low power usage BUT…. Programming for Parallelism is not easy CCDP 2011, Mysore Park Workshop
Why is Parallel Programming Difficult? Sequential Approach Start with 1 Add -1/3 Add +1/5 Add (-1)n/(2n+1) ∏= 4x result Parallel Approach +1 Generate large no. of random points (x,y) within (-1, +1) -1 +1 -1 Which point falls within circle? Count number of points within circle ∏= 4x (number within circle)/(total number of points) D. Patterson, “The Trouble with Multi-Core”, IEEE Spectrum 2010 Parallel Programming needs entirely different way of thinking.. For example- Calculating Value of ∏
HPC – Crossing the Chasm New Infrastructure * More and more raw compute power (GPU/ many-core/Cloud) Business/ Scientific Computation * Ever increasing demand • New design challenges • Architecture-aware design • GPU memory hierarchy, thread model • Elastic infrastructure • Data-driven computation (functional programming paradigm) • Software engineering support • Design Assistance • Programming Assistance • Verification, validation • Transformation, refactoring Building parallel algorithm is 5 to 10 times harder Existing applications are not meant for parallel infrastructure
Why is Parallel Workbench important? Challenges/Research Questions How do I refractor my application to exploit multiple cores on the CPU and GPU? How do I simplify the design and implementation of parallel applications? How do I optimally use the computing power? Optimal usage of Thread Optimal usage of Memory Optimal usage of Clusters • Faster to build • Faster to re-factor • Help to hide architectural complexity • Better portability • Better code • usage of hardware resource
Source Code Parallelization Assistant CCDP 2011, Mysore Park Workshop
Approach to Loop Analysis • Makes loop conditions as • affine constraints (i.e. linear + constants) • to form a polytope for(i=0;i<n;i++){ for(j=0; j<i; j++){ S; } } An integral polytope has an associated Ehrhart polynomial which encodes the relationship between the volume of a polytope and the number of integer points the polytope contains. All the polyhedra points denote the variable values wherein the loop conditions are satisfied. Use a polytope solver to approximate the total number of iterations. Barvinok, A. I. (2006). Computing the Ehrhart quasi-polynomial of a rational simplex. Math. Comp. 75, 1449–1466
Volume Computation • Volume computation is performed by Barvinok, an opensourcepolytope library. • Given a polytope represented by a set of affine inequalities we can determine the volume of the polytope by subdividing it into simplexes • Simplexes are a generalization of the triangle to N-dimensions whose volume can be easily computed using linear algebra. • The final result is obtained by summing together the number of points inside all the simplexes. CCDP 2011, Mysore Park Workshop
Example Code – dcraw.c CCDP 2011, Mysore Park Workshop
Barvinok Equation CCDP 2011, Mysore Park Workshop
Future Work • Enabling developer to supply domain specific knowledge • Devising usable parameters • Use of source code annotations • Program slicing to enable quicker analysis • Loop iteration dependency analysis • Generation of GPU specific code for the identified part