1 / 19

Parallel computing

Parallel computing. Petr Štětka Jakub Vlášek Department of Applied Electronics and Telecommunications , Faculty of electrical engineering , University of West Bohemia , Czech Republic. About the project. Laboratory of Information Technology of JINR Project supervisor

delu
Download Presentation

Parallel computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel computing Petr Štětka Jakub Vlášek Department of Applied Electronics and Telecommunications, Faculty of electrical engineering, University of West Bohemia, Czech Republic

  2. About the project • Laboratory of Information Technology of JINR • Project supervisor • Sergey Mitsyn, Alexander Ayriyan • Topics • Grids - gLite • MPI • NVIDIA CUDA

  3. Grids – introduction

  4. Grids II • Loose federation of shared resources • More efficient usage • Security • Grid provides • Computational resources (Computing Elements) • Storage resources (Storage Elements) • Resource broker (Workload Management System)

  5. gLite framework • Middleware • EGEE (EnablingGridsforE-sciencE) • User Management (security) • Users, Groups, Sites • Certificate based • Data Management • Replication • Workload Management • Matching requirements against resources

  6. gLite – User management • Users • Each user needs a certificate • Accepts AUP • Membership in a Virtual Organization • Proxy certificates • Applications use it on user’s behalf • Proxy certificate initializationvoms-proxy-init –voms edu

  7. gLite - jobs • Write job in Job Description Language • Submit jobglite-wms-job-submit –a myjob.jdl • Check statusglite-wms-job-status <job_id> • Retrieve Outputglite-wms-job-output<job_id> Executable = “myapp"; StdOutput = “output.txt"; StdError = "stderr.txt"; InputSandbox = {“myapp", "input.txt"}; OutputSandbox = {"output.txt","stderr.txt"};Requirements = …

  8. Algorithmic parallelization • Embarrassingly parallel • Set of independent data • Hard to parallelize • Interdependent data, performance depends on interconnect • Amdahl'slaw- example • Program takes 100 hours • Particular portion of 5 hours cannot be parallelized • Remaining portion of 95 hours(%) can be parallelized • => Execution can not be shorter than 5 hours, no matter how many resources we allocate. • Speedup is limited up to 20×

  9. Message Passing Interface • API (Application Programming Interface) • De facto standard for parallel programming • Multi processor systems • Clusters • Supercomputers • Abstracts away the complexity of writing parallel programs • Available bindings • C • C++ • Fortran • Python • Java

  10. Message Passing Interface II • Process communication • Master slave model • Broadcast • Point to point • Blocking or non-blocking • Process communication topology • Cartesian • Graph • Requires specification of data type • Provides interface to shared file system • Every process has a “view” of a file • Locking primitives

  11. MPI – Test program Someone@vps101:~/mpi# mpirun -np 4 ./mex 1 200 10000000 Partial integration ( 2 of 4) (from 1.000000000000e+00 to 1.000000000000e+02 in 2500000 steps) = 1.061737467015e+01 Partial integration ( 3 of 4) (from 2.575000000000e+01 to 1.000000000000e+02 in 2500000 steps) = 2.439332078942e-15 Partial integration ( 1 of 4) (from 7.525000000000e+01 to 1.000000000000e+02 in 2500000 steps) = 0.000000000000e+00 Partial integration ( 4 of 4) (from 5.050000000000e+01 to 1.000000000000e+02 in 2500000 steps) = 0.000000000000e+00 Numerical Integration result: 1.061737467015e+01 in 0.79086 seconds Numerical integration by one process: 1.061737467015e+01 • Numerical integration - Rectangle method top-left • Input parameters: beginning, end, step. • Function compiled in into the program • gLite script • Runs on grid

  12. Test program evaluation – 4 core CPU

  13. CUDA • Programmed in C++ language • Gridable • GPGPU • Parallel architecture • Proprietary technology • GeForce 8000+ • FP precision • PFLOPS range (Tesla)

  14. CUDA II • An enormous part of the GPU is dedicated to execution, unlike the CPU • Blocks * threads representthe total number of threadsthat will be processed by the kernel

  15. CUDA Test program CUDA CLI Output Integration (CUDA) = 10.621515274048 in 1297.801025 ms (SINGLE) Integration (CUDA) = 10.617374518106 in 1679.833374 ms (DOUBLE) Integration (CUDA) = 10.617374518106 in 1501.769043 ms (DOUBLE, GLOBAL) Integration (CPU) = 10.564660072327 in 30408.316406 ms (SINGLE) Integration (CPU) = 10.617374670093 in 30827.710938 ms (DOUBLE) Press any key to continue . . . • Numerical integration - Rectangle method top-left • Ported version of MPI Test program • 23 times faster on a notebook NVIDIA NVS4200M than one core of Sandy Bridge i5 CPU@2.5GHz • 160 times faster on a desktop GeForce GTX 480 than one core of AMD 1055T CPU@2.7GHz

  16. Conclusion • Familiarized with parallel computing technologies • Grid with gLite middleware • MPI API • CUDA technology • Written program for numerical integration • Running on grid • With MPI support • Also ported to Graphic card using CUDA technology • It works!

  17. Thankyouforyourattention

  18. Distributed Computing • CPU scavenging • 1997 ditstributed.net – RC5 cipher cracking • Proof of concept • 1999 SETI • BOINC • Clusters • Cloud computing • Grids • LHC

  19. MPI - functions • Initialization • Data type creation • Data exchange – all process to all process MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &procnum); MPI_Comm_size(MPI_COMM_WORLD, &numprocs); MPI_Type_contiguous(intcount, MPI_Datatypeoldtype, MPI_Datatype *newtype) MPI_Type_commit(MPI_Datatype*datatype) MPI_Gather(void *sendbuf, intsendcount, MPI_Datatypesendtype, void *recvbuf, intrecvcount, MPI_Datatyperecvtype, int root, MPI_Commcomm) MPI_Send(void *buf, int count, MPI_Datatypedatatype, intdest, int tag, MPI_Commcomm) … MPI_Finalize();

More Related