590 likes | 669 Views
Astronomy Applications in the TeraGrid Environment. Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC.
E N D
Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to:Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC
The TeraGrid VisionDistributing the resources is better than putting them at one site • Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications • New hardware, new networks, new software, new practices, new policies • Expand centers to support cyberinfrastructure • Distributed, coordinated operations center • Exploit unique partner expertise and resources to make whole greater than the sum of its parts • Leverage homogeneity to make the distributed computing easier and simplify initial development and standardization • Run single job across entire TeraGrid • Move executables between sites NVO Summer School Sept 2004
What is Grid Really? • A set of powerful Beowulf clusters • Lots of disk storage • Fast interconnection • Unified account management • Interesting software • The Grid is not • Magic • Infinite • Simple • A universal panacea • The hype that you have read NVO Summer School Sept 2004
Grid as Federation • Teragrid as a federation • independent centers flexibility • unified interface • power and strength • Large/small state compromise NVO Summer School Sept 2004
TeraGrid Wide Area Network NVO Summer School Sept 2004
Quasar ScienceAn NVO-Teragrid projectPennState, CMU, Caltech • 60,000 quasar spectra from Sloan Sky Survey • Each is 1 cpu-hour: submit to grid queue • Fits complex model (173 parameter) derive black hole mass from line widths clusters NVO data services globusrun manager NVO Summer School Sept 2004
N-point galaxy correlationAn NVO-Teragrid projectPitt, CMU Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z) Lots of large parallel jobs kd-tree algorithms NVO Summer School Sept 2004
Palomar-Quest SurveyCaltech, NCSA, Yale Transient pipeline computing reservation at sunrise for immediate followup of transients Synoptic survey massive resampling (Atlasmaker) for ultrafaint detection P48 Telescope 50 Gbyte/night ALERT Caltech Yale TG NCSA NCSA and Caltech and Yale run different pipelines on the same data 5 Tbyte NVO Summer School Sept 2004
Transient from PQ from catalog pipeline NVO Summer School Sept 2004
PQ stacked images from image pipeline NVO Summer School Sept 2004
Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid projectCaltech DPOSS 15º High-quality flux-preserving, spatial accuracy Stackable Hyperatlas Edge-free Pyramid weight Mining AND Outreach Griffith Observatory "Big Picture" NVO Summer School Sept 2004
2MASS Mosaicking portalAn NVO-Teragrid projectCaltech IPAC NVO Summer School Sept 2004
TeraGrid Components • Compute hardware • Intel/Linux Clusters, Alpha SMP clusters, POWER4 cluster, … • Large-scale storage systems • hundreds of terabytes for secondary storage • Very high-speed network backbone • bandwidth for rich interaction and tight coupling • Grid middleware • Globus, data management, … • Next-generation applications NVO Summer School Sept 2004
Overview of Distributed TeraGrid Resources Site Resources Site Resources HPSS HPSS External Networks External Networks Caltech Argonne External Networks External Networks NCSA/PACI 10.3 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree NVO Summer School Sept 2004
Compute Resources – NCSA2.6 TF ~10.6 TF w/ 230 TB 30 Gbps to TeraGrid Network GbE Fabric 8 TF Madison 667 nodes 2.6 TF Madison 256 nodes Storage I/O over Myrinet and/or GbE 2p Madison 4 GB memory 2x73 GB 2p Madison 4 GB memory 2x73 GB 2p 1.3 GHz 4 or 12 GB memory 73 GB scratch 2p Madison 4 GB memory 2x73 GB 250MB/s/node * 256 nodes 250MB/s/node * 670 nodes 256 2x FC Myrinet Fabric Brocade 12000 Switches 92 2x FC Interactive+Spare Nodes 230 TB 8 4p Madison Nodes Login, FTP NVO Summer School Sept 2004
Compute Resources – SDSC1.3 TF ~4.3 + 1.1 TF w/ 500 TB 30 Gbps to TeraGrid Network GbE Fabric 3 TF Madison 256 nodes 1.3 TF Madison 128 nodes 2p Madison 4 GB memory 2x73 GB 2p 1.3 GHz 4 GB memory 73 GB scratch 2p Madison 4 GB memory 2x73 GB 128 250MB/s 128 250MB/s 128 250MB/s 128 2x FC 128 2x FC 128 2x FC Myrinet Fabric Brocade 12000 Switches 256 2x FC 500 TB Interactive+Spare Nodes 6 4p Madison Nodes Login, FTP NVO Summer School Sept 2004
Compute Resources – Caltech~ 100 GF w/ 100 TB 30 Gbps to TeraGrid Network GbE Fabric 6 Opteron nodes 33 IA32 storage nodes 100 TB /pvfs 72 GF Madison 36 IBM/Intel nodes 34 GF Madison 17 HP/Intel nodes 2p Madison 6 GB memory 2x73 GB 2p Madison 6 GB memory 73 GB scratch 2p ia32 6 GB memory 100 TB /pvfs 4p Opteron 8 GB memory 66 TB RAID5 HPSS Datawulf 2p Madison 6 GB memory 73 GB scratch 33 250MB/s 36 250MB/s 17 250MB/s Myrinet Fabric 13 2xFC 2p IBM Madison Node Interactive Node Login, FTP 13 Tape drives 1.2 PB silo raw capacity NVO Summer School Sept 2004
Wide Variety of Usage Scenarios • Tightly coupled jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO) • Thousands of independent jobs using data from a distributed data collection (NVO) • Science Gateways – "not a Unix prompt"! • from web browser with security • from application eg IRAF, IDL NVO Summer School Sept 2004
Traditional Parallel Processing • Single executables to be on a single remote machine • big assumptions • runtime necessities (e.g. executables, input files, shared objects) available on remote system! • login to a head node, choose a submission mechanism • Direct, interactive execution • mpirun –np 16 ./a.out • Through a batch job manager • qsub my_script • where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification… NVO Summer School Sept 2004
Traditional Parallel Processing II • Through globus • globusrun -r [some-teragrid-head-node].teragrid.org/jobmanager -f my_rsl_script • where my_rsl_script describes the same details as in the qsub my_script! • Through Condor-G • condor_submit my_condor_script • where my_condor_script describes the same details as the globus my_rsl_script! NVO Summer School Sept 2004
Distributed Parallel Processing • Decompose application over geographically distributed resources • functional or domain decomposition fits well • take advantage of load balancing opportunities • think about latency impact • Improved utilization of a many resources • Flexible job management NVO Summer School Sept 2004
Pipelined/dataflow processing • Suited for problems which can be divided into a series of sequential tasks where • multiple instances of problem need executing • series of data needs processing with multiple operations on each series • information from one processing phase can be passed to next phase before current phase is complete NVO Summer School Sept 2004
Security • ssh with password • Too much password-typing • Not very secure-- big break-in at TG April 04 • One failure is a big failure • all TG! • Caltech and Argonne no longer allow this • SDSC does not allow password change NVO Summer School Sept 2004
Security • ssh with public key: single sign-on! • use ssh-keygen on Unix or puttykeygen on Windows • public key file (eg id_rsa.pub) AND • private key file (eg id_rsa) AND • passphrase • on remote machine, put public ke • .ssh/authorized_keys • on local machine, combine • private key and passphrase • ATM card model • On TG, can put public key on application form • immediate login, no snailmail NVO Summer School Sept 2004
Security • X.509 certificates: single sign-on! • from a Certificate Authority (eg verisign, US navy, DOE, etc etc) It is: • Distinguished Name (DN) AND • /C=US/O=National Center for Supercomputing Applications/CN=Roy Williams • Private file (usercert.p12) AND • passphrase • Remote machine needs entry in gridmap file (maps DN to account) • use gx-map command • Can create certificate with ncsa-cert-request etc • Certificates can be lodged in web browser NVO Summer School Sept 2004
3 Ways to Submit a Job 1. Directly to PBS Batch Scheduler • Simple, scripts are portable among PBS TeraGrid clusters 2. Globus common batch script syntax • Scripts are portable among other grids using Globus 3. Condor-G • Nice interface atop Globus, monitoring of all jobs submitted via Condor-G • Higher-level tools like DAGMan NVO Summer School Sept 2004
PBS Batch Submission ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org • qsub flatten.sh –v "FILE=f544" • qstat or showq • ls *.dat • pbs.out, pbs.err files NVO Summer School Sept 2004
globus-job-submit • For running of batch/offline jobs • globus-job-submit Submit job • same interface as globus-job-run • returns immediately • globus-job-status Check job status • globus-job-cancel Cancel job • globus-job-get-output Get job stdout/err • globus-job-clean Cleanup after job NVO Summer School Sept 2004
Condor-G Job Submission mickey.disney.edu tg-login.sdsc.teragrid.org Globus API Globus job manager Condor-G executable=/wd/doit universe=globus globusscheduler=<…> globusrsl=(maxtime=10) queue PBS NVO Summer School Sept 2004
Condor-G • Combines the strengths of Condor and the Globus Toolkit • Advantages when managing grid jobs • full featured queuing service • credential management • fault-tolerance • DAGman (== pipelines) NVO Summer School Sept 2004
Condor DAGMan • Manages workflow interdependencies • Each task is a Condor description file • A DAG file controls the order in which the Condor files are run NVO Summer School Sept 2004
Where’s the disk • Home directory • $TG_CLUSTER_HOME • example /home/roy • Shared writeable global areas • $TG_CLUSTER_PFS • example /pvfs/MCA04N009/roy NVO Summer School Sept 2004
GridFtp • Moving a Test File % globus-url-copy "`grid-cert-info -subject`" \ gsiftp://localhost:5678/tmp/file1 \ file:///tmp/file2 • Also uberftp and scp NVO Summer School Sept 2004
Storage Resource Broker (SRB) • Single logical namespace while accessing distributed archival storage resources • Effectively infinite storage (first to 1TB wins a t-shirt) • Data replication • Parallel Transfers • Interfaces: command-line, API, web/portal. NVO Summer School Sept 2004
hpss-sdsc sfs-tape-sdsc hpss-caltech workstation Storage Resource Broker (SRB):Virtual Resources, Replication NCSA SDSC SRB Client (cmdline, or API) … NVO Summer School Sept 2004
Allocations Policies • TG resources allocated via the PACI allocations and review process • modeled after NSF process • TG considered as single resource for grid allocations • Different levels of review for different size allocation requests • DAC: up to 10,000 • PRAC/AAB: <200,000 SUs/year • NRAC: 200,000+ SUs/year • Policies/procedures posted at: http://www.paci.org/Allocations.html • Proposal submission through the PACI On-Line Proposal System (POPS) https://pops-submit.paci.org/ minimal review, fast turnaround NVO Summer School Sept 2004
http://www.paci.org Requesting a TeraGrid Allocation NVO Summer School Sept 2004
24/7 Consulting Support • help@teragrid.org • advanced ticketing system for cross-site support • staffed 24/7 • 866-336-2357, 9-5 Pacific Time • http://news.teragrid.org/ • Extensive experience solving problems for early access users • Networking, compute resources, extensible TeraGrid resources NVO Summer School Sept 2004
Links • www.teragrid.org/userinfo • getting an account • help@teragrid.org • news.teragrid.org • site monitors NVO Summer School Sept 2004
DPOSS flattening Source Target 2650 x 1.1 Gbyte files Cropping borders Quadratic fit and subtract Virtual data NVO Summer School Sept 2004
Driving the Queues for f in os.listdir(inputDirectory): # if the file exists, with the right size and age, then we keep it ofile = outputDirectory +"/"+ f if os.path.exists(ofile): osize = os.path.getsize(ofile) if osize != 1109404800: print " -- wrong target size, remaking", osize else: time_tgt = filetime(ofile) time_src = filetime(file) if time_tgt < time_src: print(" -- target too old or nonexistant, making") else: print " -- already have target file " continue cmd = "qsub flat.sh -v \"FILE=" + f +"\"" print " -- submitting batch job: ", cmd os.system(cmd) Here is the driver that makes and submits jobs NVO Summer School Sept 2004
PBS script A PBS script. Can do "qsub script.sh –v "FILE=f345" #!/bin/sh #PBS -N dposs #PBS -V #PBS -l nodes=1 #PBS -l walltime=1:00:00 cd /home/roy/dposs-flat/flat ./flat \ -infile /pvfs/mydata/source/${FILE}.fits \ -outfile /pvfs/mydata/target/${FILE}.fits \ -chop 0 0 1500 23552 \ -chop 0 0 23552 1500 \ -chop 0 22052 23552 23552 \ -chop 22052 0 23552 23552 \ -chop 18052 0 23552 4000 NVO Summer School Sept 2004
Atlasmakera service-oriented applicationon Teragrid Federated Images: wavelength, time, ... VO Registry SIAP SWarp Hyperatlas source detection average/max subtraction NVO Summer School Sept 2004
TAN projection SIN projection Hyperatlas Standard naming for atlases and pages TM-5-SIN-20 Page 1589 Standard Scales: scale s means 220-s arcseconds per pixel Standard Layout TM-5 layout Standard Projections HV-4 layout NVO Summer School Sept 2004
Hyperatlas is a Service All Pages: <baseURL>/getChart?atlas=TM-5-SIN-20 0 2.77777778E-4 'RA---SIN’ 'DEC--SIN' 0.0 -90.0 1 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 -85.0 2 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 36.0 -85.0 ... 1731 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 288.0 85.0 1732 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 324.0 85.0 1733 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 90.0 Best Page: <baseURL>/getChart?atlas=TM-5-SIN-20&RA=182&Dec=62 1604 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 184.61538 60.0 Numbered Page: <baseURL>/getChart?atlas=TM-5-SIN-20&page=1604 1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.61538 60.0 Replicated Implementations baseURL = http://mercury.cacr.caltech.edu:8080/hyperatlas (try services) baseURL = http://virtualsky.org/servlet NVO Summer School Sept 2004
GET services from Python This code uses a service to find the best hyperatlas page for a given sky location hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \ + "&RA=" + str(center1) + "&Dec=" + str(center2) stream = urllib.urlopen(hyperatlasURL) # result is a tab-separated line, so use split() to tokenize tokens = stream.readline().split('\t') print "Using page ", tokens[0], " of atlas ", atlas self.scale = float(tokens[1]) self.CTYPE1 = tokens[2] self.CTYPE2 = tokens[3] rval1 = float(tokens[4]) rval2 = float(tokens[5]) NVO Summer School Sept 2004