230 likes | 421 Views
R and HPC. Scaling up and out with R on shared systems Gareth Williams IMT Advanced Scientific Computing. What is CSIRO Advanced Scientific Computing?. CSIRO IMT (Information Management and Technology) Focus on High Performance Computing Focus on eResearch
E N D
R and HPC Scaling up and out with R on shared systemsGareth Williams IMT Advanced Scientific Computing
What is CSIRO Advanced Scientific Computing? • CSIRO IMT (Information Management and Technology) • Focus on High Performance Computing • Focus on eResearch • Strategy of Economies of Utilization – shared infrastructure • Support • Software • Hardware • Interfacing with partners CSIRO.
ASC information • http://intra.hpsc.csiro.au • Externally visible copy at http://www.hpsc.csiro.au • User Guides • http://intra.hpsc.csiro.au/userguides • Software information • http://intra.hpsc.csiro.au/software/ • http://nf.nci.org.au/facilities/software/ • hpchelp@csiro.au CSIRO.
ASC resources • Storage – Tape backed petabyte store • Capability and capacity clusters • Cherax – 128 core NUMA ia64 • Burnet – commodity cluster • **new GPU cluster** • Partners • NCI http://nf.nci.org.au • iVEC http://www.ivec.org • TPAC http://www.tpac.org.au • ARCS http://www.arcs.org.au • QFAB http://qfab.org (emerging) • MASSIVE (emerging) CSIRO.
General cluster schematic • Nodes and connecting/shared infrastructure Login node Users Mgmt. node Network switch Compute nodes …… Admin Batch server Shared storage … CSIRO.
Accessing the cluster • Register with ASC • Login with ssh/PuTTY – nexus ident (and passwd) > ssh -X <ident>@<headnode> • Or start PuTTY and connect to headnode • Gets you into the ‘login/head node’ cherax/burnet/gpu01/xe • simple commands > cat /etc/motd > uname –a > ls -al > ps –fHu $USER > man CSIRO.
Making R and other software available • Support staff compile R versions • Commercial compilers • Tuned BLAS/LAPACK • Installed in shared area • Extra packages of software are available as‘environment modules’ • what is loaded now?: > module list • what is available?: > module avail > module avail R • the search path for commands: > echo $PATH • load a module: > module load R • where a command is: > which R CSIRO.
Interactive vs non-interactive R • Choose version, load module and go… • Type in instructions at the R interpreter prompt • But you only get to do one thing at a time. • Scripting R tasks • Save your R instructions in a text file • Make sure you don’t need interaction • Read/write/plot to files • There are a few options for how to run R > Rscript myscript.R [ARGS] > R -–no-restore CMD BATCH myscript.R [ARGS] > R –slave -–no-restore < myscript.R • One file (or set of arguments) per task • Watch out for over-writing results! • This defines separate tasks but doesn’t get them distributed and managed… CSIRO.
Batch Queueing system • On shared ASC systems most work must be done as batch jobs • Queueing system • Distribute work over system • Avoid contention – jobs need dedicated resource • Provide equitable access – scheduling policy • Torque and Moab CSIRO.
Using the Queuing system • Submit jobs, specifying resources required (using qsub) • see >man pbs_resources • walltime • nodes (and processes per node) • vmem • gpus • software • What happens? • Job script gets saved • Scheduler assesses priority and blocks out resource • Script copied to first allocated node and run for you in batch environment • Job terminated if resources specified are exceeded • Screen Output copied back at end of job • Can query status in meantime • More info: Read the user guide CSIRO.
Scheduling • Jobs only started when resources can be dedicated. • Only ask for what you need! • Jobs that request too much memory will prevent other jobs from running • And take longer than necessary to start • PBS stdout file summarises resources usage • Long running jobs are unfriendly • Save state for restarting • Chain jobs together • Cluster may not stay up that long… • Submitting lots of jobs is OK • Not extremely short please. CSIRO.
Example R job • First look at man pages for qsub, qdel, qstat • Then write commands in a script (myjob.q) #!/bin/bash #PBS –l nodes=1:ppn=1,vmem=1GB,walltime=1:00:00 cd $PBS_O_WORKDIR Rscript R-benchmark-25.R • And submit the job > qsub myjob.q > qstat > module load moab > showq • When complete, view output on job completion in myjob.o**** and myjob.e**** • But this job has no input and each run will be more-or-less identical (equivalent) CSIRO.
Interactive batch job • When you need to have a resource dedicated for interactive use eg. • Intensive development • Debugging • Run qsub with the -I option (capital i) and an appropriate resource specification and wait for the prompt > qsub –I –l walltime=1:00:00,vmem=1GB • The scheduler still won’t start the job until resources are available – the accounting will record the resource dedicated • Log out (exit) as soon as you’re done to allow others to use the resources • Your session will be killed if you exceed the limits CSIRO.
Optimized R • In general for optimising, you need to benchmark representative test cases – but an established benchmark is a good start • http://r.research.att.com/benchmarks R-benchmark-25.R • Other benchmarks • http://www.revolution-computing.com/products/benchmarks.php • Nathan Watson-Haigh • perform cross-product of the transpose of matrix m • cp1 <- crossprod(t(m)) • cp2 <- tcrossprod(m) • Run on a dedicated system • Compare systems • Compare versions • Compare build options • Parallel scaling CSIRO.
Transpose cross product CSIRO.
R-benchmark-25 CSIRO.
Extras CSIRO.
Optimized R Summary • Optimized BLAS/LAPACK can make a very big difference • Also shared memory parallel BLAS can be effective (Intel MKL) • ATLAS would also be good • Compiler may not be so critical for R • Windows Binary distribution does not have good BLAS • Performance differences are not uniform across board • Algorithms or Problem size • You should benchmark code that you actually want to run • ASC group can help! • Pre-requisites – general R performance tips • Pre-allocate memory • Minimize I/O • Fit in memory (don’t swap) • Have dedicated resources CSIRO.
Parallelism • Scaling up vs Scaling out • Motivation: faster or distributed memory (more memory) • Shared memory parallel BLAS/LAPACK • Rmpi • R package to use MPI (message passing interface) • Must explicitly code send and receive of messages to transfer data • Hard work! > qsub –l nodes=5:ppn=2 Rmpijob.q • Revolution (Enterprise edition) • Networkspaces/sleigh • Or break up your work into independent tasks and aggregate results CSIRO.
Ensemble of jobs – scaling out • Write one job script for each task • Use a scripting framework of your choice to automate creating the files • Submit jobs in a loop • Write one job script and pass it environment variables • Use qsub ‘–v’ option • Use qsub array job ‘–t’ option • Write jobscripts on-the-fly • Use nimrod • Write a template • Iterate or search over a parameter space CSIRO.
Examples for ensembles • Nb. bash syntax here for loop – but use what you prefer! • submit scripts matching *.q > for SCRIPT in *.q; do qsub $SCRIPT; done • submit myjob.q with X set to 2.1, 2.4, 2.7.. 3.3 > for X in $(seq 2.1 0.3 3.4); do qsub –v X=$X myjob.q; done • submit myjob.q with IN set to files matching *.in > for FILE in *.in; do qsub –v IN=$FILE myjob.q; done • submit myjob.q with LINE set to each line in paramset.in > for L in $(cat paramset.in); do qsub –v LINE=$L myjob.q; done • submit myjob.q as an array job • PBS_ARRAYID will be set to 1..20 > qsub –t 1-20 myjob.q • myjob.q varying cpus requested/used - to test scaling > for N in 1 2 4 8; > do qsub –v OMP_NUM_THREADS=$N –l nodes=1:ppn=$N myjob.q; • done; CSIRO.
CSIRO IM&T Gareth Williams Outreach Manager, Advanced Scientific Computing Email:Gareth.Williams@csiro.au hpchelp@csiro.au Web:http://intranet.csiro.au/intranet/imt http://www.hpsc.csiro.au/contact Helpdesk: (03) 9669 8103 Thank you Contact UsPhone: 1300 363 400 or +61 3 9545 2176Email: Enquiries@csiro.au Web: www.csiro.au