1 / 108

Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough . Objectives. Introductory level MATLAB course for people who want to learn parallel and GPU computing in MATLAB.

tejano
Download Presentation

Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough

  2. Objectives • Introductory level MATLAB course for people who want to learn parallel and GPU computing in MATLAB. • Help participants determine when to use parallel computing and how to use MATLAB parallel & GPU computing on their local computer & on the Research Computing clusters (Killdevil/Kure)

  3. Logistics • Course Format • Overview of MATLAB topicswith Lab Exercises • UNC Research Computing • http://its.unc.edu/research

  4. Agenda • Parallel computing (1hr 10min) • What is it? • Why use it? • How to write MATLAB code in parallel (1hr) • GPU computing (20 min) • What is it & why use it? • How to write MATLAB code in for GPU computing (15 min) • How to run MATLAB parallel & GPU code on the UNC cluster (20 min) • Quick introduction to the UNC cluster (Kure) • Bsusb commands and what they mean • Questions (10 min)

  5. Parallel Computing

  6. What is Parallel Computing? • Generally, computer code is written in serial • 1 task completed after another until the script is finished with only 1 task completing at each time • Concept the computer only has 1 CPU Source: https://computing.llnl.gov/tutorials/parallel_comp/

  7. What is Parallel Computing? (cont.) • Parallel Computing: Using multiple computer processing units (CPUs) to solve a problem at the same time • The compute resources might be: computer with multiple processors or networked computers Source: https://computing.llnl.gov/tutorials/parallel_comp/

  8. Why use Parallel Computing • Save time & money (commodity components) • Provide concurrency • Solve larger problems • Use non-local resources • UNC compute cluster • SETI: 2.9 million computers • Folding (Stanford): 450,000 cpus Source: https://computing.llnl.gov/tutorials/parallel_comp/

  9. How to write code in parallel • The computational problem should be able to: • Be broken into discrete parts that can be solved simultaneously and independently • Be solved in less time with multiple compute resources than with a single compute resource.

  10. Parallel Computing in MATLAB

  11. Parallel Computing in MATLAB • MATLAB parallel Computing Toolbox (available for use at UNC) • Provides twelve workers (MATLAB computational engines) to execute applications on a multicore system. • Built in functions for parallel computing • parfor loop (for running task-parallel algorithms on multiple processors) • spmd (handles large datasets and data-parallel algorithms)

  12. Matlab Distributed Computing Toolbox • Allows MATLAB to run as many workers on a remote cluster of computers as licensing allows. • OR run more than 12 workers on a local machine. • UNC does not have a license for this toolbox- it’s extremely $$$$$$$$ • More information: http://www.mathworks.com/products/distriben/ • Course will not go over this toolbox

  13. Primary Parallel Commands • findResource • matlabpool • open • close • size • parfor (for loop) • spmd (distributed computing for datasets) • batch jobs (run job in background)

  14. findResource • Find available parallel computing resources • out = findResource()

  15. findResource Examples • lsf_sched = findResource('scheduler','type','LSF') • Find the Platform LSF scheduler on the network. • local_sched = findResource('scheduler','type','local') • Create a local scheduler that will start workers on the client machine for running your job. • jm1 = findResource('scheduler','type’, 'jobmanager’ ,'Name', 'ClusterQueue1'); • Find a particular job manager by its name.

  16. More Resources for findResource • http://www.mathworks.com/help/toolbox/distcomp/findresource.html

  17. Matlabpool • matlabpool open • Begins a parallel work session • Options for open matlab pool

  18. Matlabpool open • These three examples of open matlabpool each have the same result: opens a local pool of 4 workers • 1: • 2: • 3:

  19. Matlabpool • matlabpool(x) • Request the number of workers you’d like, i.e. matlabpool(4) • matlabpool(‘size’) • Tells you the number of workers available in matlabpool • i.e.

  20. Matlabpool • Request too many workers, get an error Can only request 4 workers on this machine!

  21. Matlabpool Close • Use matlabpool close to end parallel session • Options • matlabpool close force • deletes all pool jobs for current user in the cluster specified by default profile (including running jobs) • matlabpool close force <profilename> • deletes all pool jobs run in the specified profile

  22. Parallel for Loops (parfor) • parfor loops can execute for loop like code in parallel to significantly improve performance • Must consist of code broken into discrete parts that can be solved simultaneously (i.e. it can’t be serial)

  23. Parfor example • Will work in parallel, loop increments are not dependent on each other: open matlabpool local 2 j=zeros(100,1); %pre-allocate vector parfori=2:100; j(i,1)=5*i; end; close matlabpool Makes the loop run in parallel

  24. Serial Loop example • Won’t work in parallel- it’s serial: j=zeros(100,1); %pre-allocate vector j(1)=5; for i=2:100; j(i,1)=j(i-1)+5; end; j(i-1) needed to calculate j(i,1)  serial!!!

  25. Parallel for Loops (parfor) • Can not nest parfor loops within parfor loops parfori=1:10 parforj=1:10 x(i,j)=1; end; end;

  26. Parallel for Loops (parfor) • If a function is used with multiple outputs, within a parfor loop MATLAB will have difficulty figuring out how to run the parfor loop. e.g. for i=1:10 [x{i}(:,1), x{i}(,:2)]=functionName(z,w) end

  27. Parallel for Loops (parfor) • Use this instead for i=1:10 [x1, x2]=functionName(z,w); x{i}=[x1 x2]; end

  28. Parallel for Loops (parfor) For parallel computing to be worth your time: the task must be solved in less time with multiple compute resources than with a single compute resource.

  29. Test the efficiency of your parallel code • Use MATLAB’s tic & toc functions • Tic starts a timer • Toc tells you the number of seconds since the tic function was called

  30. Tic & Toc Simple Example tic; parfori=1:10 z(i)=10; end; toc

  31. Check efficiency of simple parfor loop clear; clc; matlabpool(4) k=(zeros(10,3)); m=1; i=1; while i<1e8 [time1 time2]=testParfor(i); k(m,:)= [i time1 time2]; m=m+1; i=i*10; end;

  32. Check efficiency of simple parfor loop function [t1e, t2e]=testParfor(x) A=ones(x,1).*4; B=zeros(x,1); t1s=tic; matlabpool(4) parfori = 1:length(A) B(i) = sqrt(A(i)); end t1e=toc(t1s); matlabpool close B=zeros(x,1); t2s=tic; for i = 1:length(A) B(i) = sqrt(A(i)); end t2e=toc(t2s);

  33. Result of Check Efficiency of parfor • For loop is much more efficient than parfor loop- more resources does not necessary equate to a faster run time!!

  34. Parfor Efficiency • Previous example is not an effective use of a parfor loop because it takes more time to evaluate than a for loop. • Data transfer is the issue • Parfor is more effective with long running calculations within the loop • Generally more iterations increase the efficiency of a parfor loop

  35. Lab Exercise with parfor • Lab exercise: • Turn a non-parallel function into a function that can run in parallel • Go through each section of each and determine if it can be written in parallel and if so, how? (%% denotes a new section)

  36. Lab Exercise function N=calcNeighNp(neighPoly,manzPoly,manzPop93, manzPop05, manzID) %Find the manzanas which don't have an associated population in 93, but a population in 05 j=1; for i=1:length(manzPop93) if manzPop93(i,1)==0 && manzPop05(i,1)>0 no93manzPopID(j,1)=manzID(i,1); j=j+1; end; end; %% matlabpool(x) %start matlabpool %parfor can't be used here because it’s serial

  37. Lab Exercise %parfor can't be used here because it’s serial %% %Calculate the average monthly population change (excluding the data pts with no pop in 1993); MonthsC=(2005-1993); j=1; count=0; TotalPopC=0; for i=1:length(manzPop93) if manzID(i,1)~=no93manzPopID(j,1) TotalPopC=TotalPopC+((manzPop05(i,1)-manzPop93(i,1))/MonthsC); count=count+1; else j=j+1; end; end; %%

  38. Lab Exercise %% meanPopChangeM=TotalPopC/count; PopChangeMmanz=zeros(length(manzPop05),1); %Calculate the monthly population change for all the manzanas for i=1:length(manzPop05) for j=1:length(no93manzPopID) if manzID(i,1)==no93manzPopID(j,1) PopChangeMmanz(i,1)=meanPopChangeM; break; else PopChangeMmanz(i,1)=(manzPop05(i,1)-manzPop93(i,1))/MonthsC; end; end; end; %% parfori=1:length(manzPop05) %% break must be deleted, not permitted in parfor

  39. Lab Exercise %% %Now calculate what the midpoint population midPop=manzPop93+(PopChangeMmanz*9.5); %turn the neighs and manz clockwise to calc pop for i=1:length(neighPoly) [neighClock{i}(:,1) neighClock{i}(:,2)] = poly2cw(neighPoly{i}(:,1),neighPoly{i}(:,2)); end; for i=1:length(manzPoly) [manzClock{i}(:,1) manzClock{i}(:,2)]= poly2cw(manzPoly{i}(:,1),manzPoly{i}(:,2)); end; %% parfori=1:length(neighPoly) [temp1, temp2] = poly2cw(neighPoly{i}(:, 1),neighPoly{i}(:,2)); neighClock{i}=[temp1 temp2]; end; parfori=1:length(manzPoly) [temp1, temp2] = poly2cw(manzPoly{i}(:, 1),manzPoly{i}(:,2)); manzClock{i}=[temp1 temp2]; end;

  40. Lab Exercise %% %calculate the areas of the manzanas; polyAreaR=zeros(length(manzClock),1); for i=1:length(manzClock) polyAreaR(i,1)=calcArea(manzClock{i}(:,1), manzClock{i}(:,2)); end; %% parfori=1:length(manzClock) polyAreaR(i,1)=calcArea(manzClock{i}(:,1), manzClock{i}(:,2)); end;

  41. Lab Exercise %calculate the population for each of the neighs as function of the manzanas & sum calculated pop N=zeros(length(neighClock),1); %pre-allocate the vector; for i=1:length(neighClock) m=0; Ntemp=zeros(length(manzClock),1); for j=1:length(manzClock) [tempxtempy]=polybool('intersection', neighClock{i}(:,1),neighClock{i}(:, 2) ,manzClock{j}(:,1),manzClock{j}(:,2)); if isempty(tempx)==0; m=m+1; Ntemp(m,1)=(calcArea(tempx,tempy)/polyAreaR(j,1))*midPop(j); end; end; N(i,1)=(sum(Ntemp)); end; parfori=1:length(neighClock)

  42. More parfor resources • Loren Shure’s blog entry on parfor • http://blogs.mathworks.com/loren/2009/10/02/using-parfor-loops-getting-up-and-running/ • Advanced parfor topics (MATLAB online help) • http://www.mathworks.com/help/toolbox/distcomp/brdqtjj-1.html#bq_of7_-1 • Lauren Shore (MATLAB engineer)

  43. Functions to support parfor performance • All functions are included in the online Parallel MATLAB program files • Parfor progress monitor (user created) • http://www.mathworks.com/matlabcentral/fileexchange/24594-parfor-progress-monitor • Parallel Profiler (user created) • http://www.mathworks.com/help/toolbox/distcomp/bra51qt-1.html#brcrm_t

  44. Functions to support parfor performance • All functions are included in the online Parallel MATLAB program files • User-created codes • Parfor progress monitor (user created) • http://www.mathworks.com/matlabcentral/fileexchange/24594-parfor-progress-monitor

  45. Functions to support parfor performance • Parallel Profiler (built-in function) • http://www.mathworks.com/help/toolbox/distcomp/bra51qt-1.html#brcrm_t • partictoc • You can also use this user created function, partictoc to examine the efficiency of your parallel code • Download at:http://www.mathworks.com/matlabcentral/fileexchange/27472-partictoc

  46. Spmd • Used to Partition large data sets • Excellent when you want to work with an array too large for your computer’s memory

  47. Spmd • Spmd distributes the array among MATLAB workers (each worker contains a part of the array) • However, still can operate on entire array as 1 entity • Workers automatically transfer data between when necessary i.e matrix multiplication.

  48. Spmd Format • Format matlabpool (4) spmd statements end • Simple Example matlabpool(4) spmd j=zeros(1e7,1); end;

  49. Spmd Examples • Result j is a Composite with 4 parts!

More Related