1 / 23

Dilemma of Parallel Programming

Dilemma of Parallel Programming. Xinhua Lin ( 林新华 ) HPC Lab of SJTU @XJTU, 17 th Oct 2011 . Disclaimers. I am not funded by CRAY S lides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘ The Mother of All Chapel Talks ’, with permission from himself

tracen
Download Presentation

Dilemma of Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dilemma of Parallel Programming Xinhua Lin (林新华) HPC Lab of SJTU @XJTU, 17thOct 2011

  2. Disclaimers • I am not funded by CRAY • Slides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘The Mother of All Chapel Talks’, with permission from himself • Funny pictures are from Internet

  3. About me and HPC Lab in SJTU • Directing HPC Lab • Co-translator of PPP • Co-founder of HMPP CoCfor AP&Japan • As MS HPC Invitation institutes @SH • Support For HPC Center of SJTU • Hold SJTU HPC Seminar monthly http://itis.grid.sjtu.edu.cn/blog

  4. Three Challenges for ParaProgin multi/many core era • Revolution V.S. Evolution • Low level V.S. High level • Performance V.S. Programmable • Performance V.S. Performance Portability For more detail: Paper Version: <中国教育网络> Special issue for HPC and Cloud, Sep 2011 Online Version: http://itis.grid.sjtu.edu.cn/blog

  5. Outline • Right Level to expose Parallel • ParaProg languages Reviews • Multiresolution and Chapel

  6. Right Level to Expose Parallel

  7. Can we stop water/parallel ? Language Library OS ISA Hardware

  8. Performance V.S. Programmable High Level Low Level Higher-Level Abstractions ZPL HPF MPI Expose Implementing Mechanisms Target Machine OpenMP Target Machine pthreads Target Machine “Why don’t I have more control?” “Why is everything so tedious?”

  9. ParaProg Education • Tired of teaching yet another specific lang. • MPI for Cluster • OpenMP for SMP then Multi-core CPU • CUDA for GPU, and now OpenCL • More on the way… • Had to explain concepts by different tools • Single lang. to explain them all? • Similar in OS education • Production OS: Linux, Unix and Window • OS only for education: Minix

  10. ParaProg languages Reviews

  11. Hybrid Programming Model • MPI is insufficient in multi/many core era • OpenMP for multi-core • CUDA/OpenCL for many-core* • So called Hybrid Programming was invented as a temporary solution, workable but ugly • MPI+OpenMP for Multi-core cluster • MPI+CUDA/OpenCL for GPU cluster like Tianhe-1A • Similar idea used in CUDAfor thread and thread-block, OpenCL for work-item and work-group * We will wait and see how OpenMP works on Intel MIC

  12. ParaProg from different ways • Low Level (expose implementation mechanism ) • MPI, CUDAand OpenCL • OpenMP • High Level • PGAS: CAF, UPC and Tianuim • Global View: NESL, ZPL • APGAS: Chapel, X10 • Directive Based • HMPP, PGI, CRAY-directive

  13. Mulutiesolution and Chapel

  14. What is Mulutiesolution? Structure the language in a layered manner, permitting it to be used at multiple levels as required/desired • support high-level features and automation for convenience • provide the ability to drop down to lower, more manual levels • use appropriate separation of concerns to keep these layers clean Distributions language concepts Data parallelism Task Parallelism Locality Control Base Language Target Machine

  15. Where Chapel was born: HPCS HPCS: High Productivity Computing Systems (DARPA et al.) • Goal: Raise productivity of high-end computing users by 10 • Productivity = Performance + Programmability + Portability + Robustness • Phase II: Cray, IBM, Sun (July 2003 – June 2006) • Evaluated the entire system architecture’s impact on productivity… • processors, memory, network, I/O, OS, runtime, compilers, tools, … • …and new languages: Cray: Chapel IBM: X10 Sun: Fortress • Phase III: Cray, IBM (July 2006 – 2010) • Implement the systems and technologies resulting from phase II • (Sun also continues work on Fortress, without HPCS funding)

  16. fragmented ( ( ( + + )/2 + )/2 )/2 = = = Global-view V.S. Fragmented Problem: “Apply 3-pt stencil to vector” global-view ( + )/2 =

  17. SPMD def main() { varn: int = 1000; varlocN: int = n/numProcs; vara, b: [0..locN+1] real; if(iHaveRightNeighbor) { send(right, a(locN)); recv(right, a(locN+1)); } if(iHaveLeftNeighbor) { send(left, a(1)); recv(left, a(0)); } foralliin1..locN { b(i) = (a(i-1) + a(i+1))/2; } } Global-view V.S. SPMD Code Global-View def main() { varn: int = 1000; vara, b: [1..n] real; foralliin 2..n-1 { b(i) = (a(i-1) + a(i+1))/2; } }

  18. Chapel Overview • A design principle for HPC • “Support the general case, optimize for the common case” • Data Parallel (ZPL) + Task Parallel(CRAY MTA) + Script Lang. • Latest version 1.3.0 is available in as OSS: • http://sourceforge.net/projects/chapel Distributions Data parallelism Task Parallelism language concepts Locality Control Base Language Target Machine

  19. Chapel example: Heat Transfer n A: repeat until max change <  n 1.0   4

  20. Chapel Code For Heat Transfer

  21. Chapel as Minix in ParaProg • If I were to offer a ParaProg class, I’d want to teach about: • data parallelism • task parallelism • concurrency • synchronization • locality/affinity • deadlock, livelock, and other pitfalls • performance tuning • …

  22. Conclusion—Major Points • Programmable and Performance are always the dilemma of ParaProg • Multiresolution sounds perfect in theory but not mature enough for production • However, Chapel could be used as Minix in ParaProg

  23. Q&A

More Related