1.12k likes | 1.37k Views
Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough . Objectives. Introductory level MATLAB course for people who want to learn parallel and GPU computing in MATLAB.
E N D
Parallel & GPU computing in MATLAB ITS Research Computing Lani Clough
Objectives • Introductory level MATLAB course for people who want to learn parallel and GPU computing in MATLAB. • Help participants determine when to use parallel computing and how to use MATLAB parallel & GPU computing on their local computer & on the Research Computing clusters (Killdevil/Kure)
Logistics • Course Format • Overview of MATLAB topicswith Lab Exercises • UNC Research Computing • http://its.unc.edu/research
Agenda • Parallel computing (1hr 10min) • What is it? • Why use it? • How to write MATLAB code in parallel (1hr) • GPU computing (20 min) • What is it & why use it? • How to write MATLAB code in for GPU computing (15 min) • How to run MATLAB parallel & GPU code on the UNC cluster (20 min) • Quick introduction to the UNC cluster (Kure) • Bsusb commands and what they mean • Questions (10 min)
What is Parallel Computing? • Generally, computer code is written in serial • 1 task completed after another until the script is finished with only 1 task completing at each time • Concept the computer only has 1 CPU Source: https://computing.llnl.gov/tutorials/parallel_comp/
What is Parallel Computing? (cont.) • Parallel Computing: Using multiple computer processing units (CPUs) to solve a problem at the same time • The compute resources might be: computer with multiple processors or networked computers Source: https://computing.llnl.gov/tutorials/parallel_comp/
Why use Parallel Computing • Save time & money (commodity components) • Provide concurrency • Solve larger problems • Use non-local resources • UNC compute cluster • SETI: 2.9 million computers • Folding (Stanford): 450,000 cpus Source: https://computing.llnl.gov/tutorials/parallel_comp/
How to write code in parallel • The computational problem should be able to: • Be broken into discrete parts that can be solved simultaneously and independently • Be solved in less time with multiple compute resources than with a single compute resource.
Parallel Computing in MATLAB • MATLAB parallel Computing Toolbox (available for use at UNC) • Provides twelve workers (MATLAB computational engines) to execute applications on a multicore system. • Built in functions for parallel computing • parfor loop (for running task-parallel algorithms on multiple processors) • spmd (handles large datasets and data-parallel algorithms)
Matlab Distributed Computing Toolbox • Allows MATLAB to run as many workers on a remote cluster of computers as licensing allows. • OR run more than 12 workers on a local machine. • UNC does not have a license for this toolbox- it’s extremely $$$$$$$$ • More information: http://www.mathworks.com/products/distriben/ • Course will not go over this toolbox
Primary Parallel Commands • findResource • matlabpool • open • close • size • parfor (for loop) • spmd (distributed computing for datasets) • batch jobs (run job in background)
findResource • Find available parallel computing resources • out = findResource()
findResource Examples • lsf_sched = findResource('scheduler','type','LSF') • Find the Platform LSF scheduler on the network. • local_sched = findResource('scheduler','type','local') • Create a local scheduler that will start workers on the client machine for running your job. • jm1 = findResource('scheduler','type’, 'jobmanager’ ,'Name', 'ClusterQueue1'); • Find a particular job manager by its name.
More Resources for findResource • http://www.mathworks.com/help/toolbox/distcomp/findresource.html
Matlabpool • matlabpool open • Begins a parallel work session • Options for open matlab pool
Matlabpool open • These three examples of open matlabpool each have the same result: opens a local pool of 4 workers • 1: • 2: • 3:
Matlabpool • matlabpool(x) • Request the number of workers you’d like, i.e. matlabpool(4) • matlabpool(‘size’) • Tells you the number of workers available in matlabpool • i.e.
Matlabpool • Request too many workers, get an error Can only request 4 workers on this machine!
Matlabpool Close • Use matlabpool close to end parallel session • Options • matlabpool close force • deletes all pool jobs for current user in the cluster specified by default profile (including running jobs) • matlabpool close force <profilename> • deletes all pool jobs run in the specified profile
Parallel for Loops (parfor) • parfor loops can execute for loop like code in parallel to significantly improve performance • Must consist of code broken into discrete parts that can be solved simultaneously (i.e. it can’t be serial)
Parfor example • Will work in parallel, loop increments are not dependent on each other: open matlabpool local 2 j=zeros(100,1); %pre-allocate vector parfori=2:100; j(i,1)=5*i; end; close matlabpool Makes the loop run in parallel
Serial Loop example • Won’t work in parallel- it’s serial: j=zeros(100,1); %pre-allocate vector j(1)=5; for i=2:100; j(i,1)=j(i-1)+5; end; j(i-1) needed to calculate j(i,1) serial!!!
Parallel for Loops (parfor) • Can not nest parfor loops within parfor loops parfori=1:10 parforj=1:10 x(i,j)=1; end; end;
Parallel for Loops (parfor) • If a function is used with multiple outputs, within a parfor loop MATLAB will have difficulty figuring out how to run the parfor loop. e.g. for i=1:10 [x{i}(:,1), x{i}(,:2)]=functionName(z,w) end
Parallel for Loops (parfor) • Use this instead for i=1:10 [x1, x2]=functionName(z,w); x{i}=[x1 x2]; end
Parallel for Loops (parfor) For parallel computing to be worth your time: the task must be solved in less time with multiple compute resources than with a single compute resource.
Test the efficiency of your parallel code • Use MATLAB’s tic & toc functions • Tic starts a timer • Toc tells you the number of seconds since the tic function was called
Tic & Toc Simple Example tic; parfori=1:10 z(i)=10; end; toc
Check efficiency of simple parfor loop clear; clc; matlabpool(4) k=(zeros(10,3)); m=1; i=1; while i<1e8 [time1 time2]=testParfor(i); k(m,:)= [i time1 time2]; m=m+1; i=i*10; end;
Check efficiency of simple parfor loop function [t1e, t2e]=testParfor(x) A=ones(x,1).*4; B=zeros(x,1); t1s=tic; matlabpool(4) parfori = 1:length(A) B(i) = sqrt(A(i)); end t1e=toc(t1s); matlabpool close B=zeros(x,1); t2s=tic; for i = 1:length(A) B(i) = sqrt(A(i)); end t2e=toc(t2s);
Result of Check Efficiency of parfor • For loop is much more efficient than parfor loop- more resources does not necessary equate to a faster run time!!
Parfor Efficiency • Previous example is not an effective use of a parfor loop because it takes more time to evaluate than a for loop. • Data transfer is the issue • Parfor is more effective with long running calculations within the loop • Generally more iterations increase the efficiency of a parfor loop
Lab Exercise with parfor • Lab exercise: • Turn a non-parallel function into a function that can run in parallel • Go through each section of each and determine if it can be written in parallel and if so, how? (%% denotes a new section)
Lab Exercise function N=calcNeighNp(neighPoly,manzPoly,manzPop93, manzPop05, manzID) %Find the manzanas which don't have an associated population in 93, but a population in 05 j=1; for i=1:length(manzPop93) if manzPop93(i,1)==0 && manzPop05(i,1)>0 no93manzPopID(j,1)=manzID(i,1); j=j+1; end; end; %% matlabpool(x) %start matlabpool %parfor can't be used here because it’s serial
Lab Exercise %parfor can't be used here because it’s serial %% %Calculate the average monthly population change (excluding the data pts with no pop in 1993); MonthsC=(2005-1993); j=1; count=0; TotalPopC=0; for i=1:length(manzPop93) if manzID(i,1)~=no93manzPopID(j,1) TotalPopC=TotalPopC+((manzPop05(i,1)-manzPop93(i,1))/MonthsC); count=count+1; else j=j+1; end; end; %%
Lab Exercise %% meanPopChangeM=TotalPopC/count; PopChangeMmanz=zeros(length(manzPop05),1); %Calculate the monthly population change for all the manzanas for i=1:length(manzPop05) for j=1:length(no93manzPopID) if manzID(i,1)==no93manzPopID(j,1) PopChangeMmanz(i,1)=meanPopChangeM; break; else PopChangeMmanz(i,1)=(manzPop05(i,1)-manzPop93(i,1))/MonthsC; end; end; end; %% parfori=1:length(manzPop05) %% break must be deleted, not permitted in parfor
Lab Exercise %% %Now calculate what the midpoint population midPop=manzPop93+(PopChangeMmanz*9.5); %turn the neighs and manz clockwise to calc pop for i=1:length(neighPoly) [neighClock{i}(:,1) neighClock{i}(:,2)] = poly2cw(neighPoly{i}(:,1),neighPoly{i}(:,2)); end; for i=1:length(manzPoly) [manzClock{i}(:,1) manzClock{i}(:,2)]= poly2cw(manzPoly{i}(:,1),manzPoly{i}(:,2)); end; %% parfori=1:length(neighPoly) [temp1, temp2] = poly2cw(neighPoly{i}(:, 1),neighPoly{i}(:,2)); neighClock{i}=[temp1 temp2]; end; parfori=1:length(manzPoly) [temp1, temp2] = poly2cw(manzPoly{i}(:, 1),manzPoly{i}(:,2)); manzClock{i}=[temp1 temp2]; end;
Lab Exercise %% %calculate the areas of the manzanas; polyAreaR=zeros(length(manzClock),1); for i=1:length(manzClock) polyAreaR(i,1)=calcArea(manzClock{i}(:,1), manzClock{i}(:,2)); end; %% parfori=1:length(manzClock) polyAreaR(i,1)=calcArea(manzClock{i}(:,1), manzClock{i}(:,2)); end;
Lab Exercise %calculate the population for each of the neighs as function of the manzanas & sum calculated pop N=zeros(length(neighClock),1); %pre-allocate the vector; for i=1:length(neighClock) m=0; Ntemp=zeros(length(manzClock),1); for j=1:length(manzClock) [tempxtempy]=polybool('intersection', neighClock{i}(:,1),neighClock{i}(:, 2) ,manzClock{j}(:,1),manzClock{j}(:,2)); if isempty(tempx)==0; m=m+1; Ntemp(m,1)=(calcArea(tempx,tempy)/polyAreaR(j,1))*midPop(j); end; end; N(i,1)=(sum(Ntemp)); end; parfori=1:length(neighClock)
More parfor resources • Loren Shure’s blog entry on parfor • http://blogs.mathworks.com/loren/2009/10/02/using-parfor-loops-getting-up-and-running/ • Advanced parfor topics (MATLAB online help) • http://www.mathworks.com/help/toolbox/distcomp/brdqtjj-1.html#bq_of7_-1 • Lauren Shore (MATLAB engineer)
Functions to support parfor performance • All functions are included in the online Parallel MATLAB program files • Parfor progress monitor (user created) • http://www.mathworks.com/matlabcentral/fileexchange/24594-parfor-progress-monitor • Parallel Profiler (user created) • http://www.mathworks.com/help/toolbox/distcomp/bra51qt-1.html#brcrm_t
Functions to support parfor performance • All functions are included in the online Parallel MATLAB program files • User-created codes • Parfor progress monitor (user created) • http://www.mathworks.com/matlabcentral/fileexchange/24594-parfor-progress-monitor
Functions to support parfor performance • Parallel Profiler (built-in function) • http://www.mathworks.com/help/toolbox/distcomp/bra51qt-1.html#brcrm_t • partictoc • You can also use this user created function, partictoc to examine the efficiency of your parallel code • Download at:http://www.mathworks.com/matlabcentral/fileexchange/27472-partictoc
Spmd • Used to Partition large data sets • Excellent when you want to work with an array too large for your computer’s memory
Spmd • Spmd distributes the array among MATLAB workers (each worker contains a part of the array) • However, still can operate on entire array as 1 entity • Workers automatically transfer data between when necessary i.e matrix multiplication.
Spmd Format • Format matlabpool (4) spmd statements end • Simple Example matlabpool(4) spmd j=zeros(1e7,1); end;
Spmd Examples • Result j is a Composite with 4 parts!