1 / 27

GAMMA: An Efficient Distributed Shared Memory Toolbox for MATLAB

GAMMA: An Efficient Distributed Shared Memory Toolbox for MATLAB. Rajkiran Panuganti 1 , Muthu Baskaran 1 , Jarek Nieplocha 2 , Ashok Krishnamurthy 3 , Atanas Rountev 1 , P. Sadayappan 1 1 The Ohio State University 2 PNNL 3 Ohio Supercomputer Center. Overview. Motivation

jholloway
Download Presentation

GAMMA: An Efficient Distributed Shared Memory Toolbox for MATLAB

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GAMMA: An Efficient Distributed Shared Memory Toolbox for MATLAB Rajkiran Panuganti1, Muthu Baskaran1, Jarek Nieplocha2, Ashok Krishnamurthy3, Atanas Rountev1, P. Sadayappan1 1 The Ohio State University 2 PNNL 3 Ohio Supercomputer Center

  2. Overview • Motivation • GAMMA Programming Model • Implementation Overview • Experimental Evaluation • Conclusions

  3. High Productivity Computing • Programmers’ productivity is extremely important • C/Fortran – Good performance but poor productivity • Parallel Programming in C/Fortran even harder • MATLAB, Python etc. – Good programmer productivity • Poor performance and inability to run large scale problems (memory limitations)

  4. MATLAB and High Productivity • Numerous features resulting in High Programmer Productivity: • Array Based Semantics • Copy/Value based semantics • Debugging and Profiling Support • Integrated Development Environment • Numerous Domain Specific libraries (Toolboxes) • Visualization • And a lot more...... • Need to retain above features while addressing performance Issues

  5. Problem Out-Of-Memory! Performance! Out-Of-Memory! 199 sec 10.19 s

  6. user user ParaM :- ‘Parallel MATLAB’ USER DParaM GAMMA Specialized Libraries mexMPI Library Writers Compiler MATLAB GA + MVAPICH GA + MVAPICH

  7. Overview • Motivation • GAMMA Programming Model • Implementation Overview • Experimental Evaluation • Conclusions

  8. Programming Model • Global Shared View of the distributed Array Physical View Logical View (1,1) P1 (250,75) P0 P2 P3 (700,610) (1024,1024) A = GA([1024, 1024],distr); Block = A(250:700,75:610);

  9. Programming Model (Contd..) • Get-Compute-Put Computation Model Get() Put() Put() Process 0 Get() Compute Process 1 Compute

  10. Other features in the Programming Model enabling Efficiency • Pass-by-reference semantics for distributed arrays • Intended for Library writers • Management of Data Locality (NUMA) • Distribution information can be retrieved by the programmer • Reference based access to the local data • Data replication • Support for replicating near-neighbor data

  11. Other features in the Programming Model enabling Efficiency Contd.. • Asynchronous operations • Support for Library Writers • Interoperable with ‘Message Passing’ • Message Passing support using ‘mexMPI’ • Interoperable with some other ‘Parallel MATLAB’ projects • Interoperable with pMATLAB, Mathworks DCT

  12. Illustration by Example (FFT2) – 2D FFT [rank, nprocs] = Begin(); dims = [N N]; distr = [N N/nprocs]; A = GA(dims, distr); tmp=local(A); % GET() tmp = fft(tmp); % Compute() Put(A,tmp); % PUT() Sync(); ATmp = GA(A); Transpose(A,ATmp); % Collective Ops Tmp = local(ATmp); Put(ATmp,fft(Tmp)); Sync(); Transpose(ATmp,A); GA_End(); Transpose

  13. Software Architecture User MATLAB Front-End GAMMA mexMPI MATLAB Computation Engine GA MPI SCALAPACK

  14. Overview • Motivation • GAMMA Programming Model • Implementation Overview • Experimental Evaluation • Conclusions

  15. Evaluation • OSC Pentium 4 Cluster • Two 2.4 GHz Intel P4 processors per node, Linux kernel 2.6.6 , 4GB RAM, • MVAPICH 0.9.4 • Infiniband • MATLAB Version 7.01 • Fully distributed environment • Evaluation using NAS Benchmarks

  16. Programmability Moderate Increase in SLOC Moderate Increase in SLOC Moderate Increase in SLOC Slight Increase in SLOC

  17. Performance Analysis

  18. Performance Analysis

  19. Speedup on Large Problem Sizes

  20. Related Work • Early 90’s – MPI & Cluster Programming • 1995 – ‘Why there isn’t a Parallel MATLAB?’ – Cleve Moler • Embarrassingly Parallel • Paralize(’98); Multi(’00); PLab(‘00); Parmatlab(‘01); • Message Passing • MultiMatlab(’96); PT(’96); DPToolbox(‘99); MATmarks(‘99); PMI(’99); MPITB/PVMTB(‘00); CMTM(‘01); • Compilation Based • Conlab(‘93); Falcon(’95); ParAL(‘95); Otter(‘98); Menhir(’98); MaJIC(’98); MATCH(‘00); RTExpress(’00); • Backend Support • Matpar(‘98); DLab(‘99); Netsolve(‘01); Paramat(‘01);

  21. Related Work (Currently Active) • Star-P (’97) – MIT • MatlabMPI(’98); pMATLAB(’02) – MIT-LL; • File-based Message Passing Communication • MATLAB_D (’00) – Rice • Telescoping Compilation + HPF + JIT Compilation • ParaM (’04) – OSU & OSC • Mathworks(’04) – MDCE/MDCT

  22. Conclusions • Discussed an efficient Distributed Shared Memory Toolbox for MATLAB • Programming Model and Efficiency features of the toolbox • Demonstrated efficiency using NAS Benchmarks • Download available upon request

  23. Questions ? Contact: panugant@cse.ohio-state.edud

  24. Backup • NAS FT – A • NAS EP – A • Implementation Issues

  25. Performance Analysis Contd…

  26. Implementation Issues • Different Memory managers • Automated Book Keeping • Data layout inconsistencies • In-Place Operations • Data movement between different workspaces • Out-of-order and irregular accesses

More Related