1 / 59

Astronomy Applications in the TeraGrid Environment

Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to: Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC

benjamin
Download Presentation

Astronomy Applications in the TeraGrid Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Astronomy Applications in the TeraGrid Environment Roy Williams, Caltech with thanks for material to:Sandra Bittner, ANL; Sharon Brunett, Caltech; Derek Simmel, PSC; John Towns, NCSA; Nancy Wilkins-Diehr, SDSC

  2. The TeraGrid VisionDistributing the resources is better than putting them at one site • Build new, extensible, grid-based infrastructure to support grid-enabled scientific applications • New hardware, new networks, new software, new practices, new policies • Expand centers to support cyberinfrastructure • Distributed, coordinated operations center • Exploit unique partner expertise and resources to make whole greater than the sum of its parts • Leverage homogeneity to make the distributed computing easier and simplify initial development and standardization • Run single job across entire TeraGrid • Move executables between sites NVO Summer School Sept 2004

  3. What is Grid Really? • A set of powerful Beowulf clusters • Lots of disk storage • Fast interconnection • Unified account management • Interesting software • The Grid is not • Magic • Infinite • Simple • A universal panacea • The hype that you have read NVO Summer School Sept 2004

  4. Grid as Federation • Teragrid as a federation • independent centers  flexibility • unified interface • power and strength • Large/small state compromise NVO Summer School Sept 2004

  5. TeraGrid Wide Area Network NVO Summer School Sept 2004

  6. Grid Astronomy

  7. Quasar ScienceAn NVO-Teragrid projectPennState, CMU, Caltech • 60,000 quasar spectra from Sloan Sky Survey • Each is 1 cpu-hour: submit to grid queue • Fits complex model (173 parameter) derive black hole mass from line widths clusters NVO data services globusrun manager NVO Summer School Sept 2004

  8. N-point galaxy correlationAn NVO-Teragrid projectPitt, CMU Finding triple correlation in 3D SDSS galaxy catalog (RA/Dec/z) Lots of large parallel jobs kd-tree algorithms NVO Summer School Sept 2004

  9. Palomar-Quest SurveyCaltech, NCSA, Yale Transient pipeline computing reservation at sunrise for immediate followup of transients Synoptic survey massive resampling (Atlasmaker) for ultrafaint detection P48 Telescope 50 Gbyte/night ALERT Caltech Yale TG  NCSA NCSA and Caltech and Yale run different pipelines on the same data 5 Tbyte NVO Summer School Sept 2004

  10. Transient from PQ from catalog pipeline NVO Summer School Sept 2004

  11. PQ stacked images from image pipeline NVO Summer School Sept 2004

  12. Wide-area Mosaicking (Hyperatlas)An NVO-Teragrid projectCaltech DPOSS 15º High-quality flux-preserving, spatial accuracy Stackable Hyperatlas Edge-free Pyramid weight Mining AND Outreach Griffith Observatory "Big Picture" NVO Summer School Sept 2004

  13. 2MASS Mosaicking portalAn NVO-Teragrid projectCaltech IPAC NVO Summer School Sept 2004

  14. Teragrid hardware

  15. TeraGrid Components • Compute hardware • Intel/Linux Clusters, Alpha SMP clusters, POWER4 cluster, … • Large-scale storage systems • hundreds of terabytes for secondary storage • Very high-speed network backbone • bandwidth for rich interaction and tight coupling • Grid middleware • Globus, data management, … • Next-generation applications NVO Summer School Sept 2004

  16. Overview of Distributed TeraGrid Resources Site Resources Site Resources HPSS HPSS External Networks External Networks Caltech Argonne External Networks External Networks NCSA/PACI 10.3 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree NVO Summer School Sept 2004

  17. Compute Resources – NCSA2.6 TF  ~10.6 TF w/ 230 TB 30 Gbps to TeraGrid Network GbE Fabric 8 TF Madison 667 nodes 2.6 TF Madison 256 nodes Storage I/O over Myrinet and/or GbE 2p Madison 4 GB memory 2x73 GB 2p Madison 4 GB memory 2x73 GB 2p 1.3 GHz 4 or 12 GB memory 73 GB scratch 2p Madison 4 GB memory 2x73 GB 250MB/s/node * 256 nodes 250MB/s/node * 670 nodes 256 2x FC Myrinet Fabric Brocade 12000 Switches 92 2x FC Interactive+Spare Nodes 230 TB 8 4p Madison Nodes Login, FTP NVO Summer School Sept 2004

  18. Compute Resources – SDSC1.3 TF  ~4.3 + 1.1 TF w/ 500 TB 30 Gbps to TeraGrid Network GbE Fabric 3 TF Madison 256 nodes 1.3 TF Madison 128 nodes 2p Madison 4 GB memory 2x73 GB 2p 1.3 GHz 4 GB memory 73 GB scratch 2p Madison 4 GB memory 2x73 GB 128 250MB/s 128 250MB/s 128 250MB/s 128 2x FC 128 2x FC 128 2x FC Myrinet Fabric Brocade 12000 Switches 256 2x FC 500 TB Interactive+Spare Nodes 6 4p Madison Nodes Login, FTP NVO Summer School Sept 2004

  19. Compute Resources – Caltech~ 100 GF w/ 100 TB 30 Gbps to TeraGrid Network GbE Fabric 6 Opteron nodes 33 IA32 storage nodes 100 TB /pvfs 72 GF Madison 36 IBM/Intel nodes 34 GF Madison 17 HP/Intel nodes 2p Madison 6 GB memory 2x73 GB 2p Madison 6 GB memory 73 GB scratch 2p ia32 6 GB memory 100 TB /pvfs 4p Opteron 8 GB memory 66 TB RAID5 HPSS Datawulf 2p Madison 6 GB memory 73 GB scratch 33 250MB/s 36 250MB/s 17 250MB/s Myrinet Fabric 13 2xFC 2p IBM Madison Node Interactive Node Login, FTP 13 Tape drives 1.2 PB silo raw capacity NVO Summer School Sept 2004

  20. Using Teragrid

  21. Wide Variety of Usage Scenarios • Tightly coupled jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO) • Thousands of independent jobs using data from a distributed data collection (NVO) • Science Gateways – "not a Unix prompt"! • from web browser with security • from application eg IRAF, IDL NVO Summer School Sept 2004

  22. Traditional Parallel Processing • Single executables to be on a single remote machine • big assumptions • runtime necessities (e.g. executables, input files, shared objects) available on remote system! • login to a head node, choose a submission mechanism • Direct, interactive execution • mpirun –np 16 ./a.out • Through a batch job manager • qsub my_script • where my_script describes executable location, runtime duration, redirection of stdout/err, mpirun specification… NVO Summer School Sept 2004

  23. Traditional Parallel Processing II • Through globus • globusrun -r [some-teragrid-head-node].teragrid.org/jobmanager -f my_rsl_script • where my_rsl_script describes the same details as in the qsub my_script! • Through Condor-G • condor_submit my_condor_script • where my_condor_script describes the same details as the globus my_rsl_script! NVO Summer School Sept 2004

  24. Distributed Parallel Processing • Decompose application over geographically distributed resources • functional or domain decomposition fits well • take advantage of load balancing opportunities • think about latency impact • Improved utilization of a many resources • Flexible job management NVO Summer School Sept 2004

  25. Pipelined/dataflow processing • Suited for problems which can be divided into a series of sequential tasks where • multiple instances of problem need executing • series of data needs processing with multiple operations on each series • information from one processing phase can be passed to next phase before current phase is complete NVO Summer School Sept 2004

  26. Security • ssh with password • Too much password-typing • Not very secure-- big break-in at TG April 04 • One failure is a big failure • all TG! • Caltech and Argonne no longer allow this • SDSC does not allow password change NVO Summer School Sept 2004

  27. Security • ssh with public key: single sign-on! • use ssh-keygen on Unix or puttykeygen on Windows • public key file (eg id_rsa.pub) AND • private key file (eg id_rsa) AND • passphrase • on remote machine, put public ke • .ssh/authorized_keys • on local machine, combine • private key and passphrase • ATM card model • On TG, can put public key on application form • immediate login, no snailmail NVO Summer School Sept 2004

  28. Security • X.509 certificates: single sign-on! • from a Certificate Authority (eg verisign, US navy, DOE, etc etc) It is: • Distinguished Name (DN) AND • /C=US/O=National Center for Supercomputing Applications/CN=Roy Williams • Private file (usercert.p12) AND • passphrase • Remote machine needs entry in gridmap file (maps DN to account) • use gx-map command • Can create certificate with ncsa-cert-request etc • Certificates can be lodged in web browser NVO Summer School Sept 2004

  29. 3 Ways to Submit a Job 1. Directly to PBS Batch Scheduler • Simple, scripts are portable among PBS TeraGrid clusters 2. Globus common batch script syntax • Scripts are portable among other grids using Globus 3. Condor-G • Nice interface atop Globus, monitoring of all jobs submitted via Condor-G • Higher-level tools like DAGMan NVO Summer School Sept 2004

  30. PBS Batch Submission ssh tg-login.[caltech|ncsa|sdsc|uc].teragrid.org • qsub flatten.sh –v "FILE=f544" • qstat or showq • ls *.dat • pbs.out, pbs.err files NVO Summer School Sept 2004

  31. globus-job-submit • For running of batch/offline jobs • globus-job-submit Submit job • same interface as globus-job-run • returns immediately • globus-job-status Check job status • globus-job-cancel Cancel job • globus-job-get-output Get job stdout/err • globus-job-clean Cleanup after job NVO Summer School Sept 2004

  32. Condor-G Job Submission mickey.disney.edu tg-login.sdsc.teragrid.org Globus API Globus job manager Condor-G executable=/wd/doit universe=globus globusscheduler=<…> globusrsl=(maxtime=10) queue PBS NVO Summer School Sept 2004

  33. Condor-G • Combines the strengths of Condor and the Globus Toolkit • Advantages when managing grid jobs • full featured queuing service • credential management • fault-tolerance • DAGman (== pipelines) NVO Summer School Sept 2004

  34. Condor DAGMan • Manages workflow interdependencies • Each task is a Condor description file • A DAG file controls the order in which the Condor files are run NVO Summer School Sept 2004

  35. Where’s the disk • Home directory • $TG_CLUSTER_HOME • example /home/roy • Shared writeable global areas • $TG_CLUSTER_PFS • example /pvfs/MCA04N009/roy NVO Summer School Sept 2004

  36. GridFtp • Moving a Test File % globus-url-copy "`grid-cert-info -subject`" \ gsiftp://localhost:5678/tmp/file1 \ file:///tmp/file2 • Also uberftp and scp NVO Summer School Sept 2004

  37. Storage Resource Broker (SRB) • Single logical namespace while accessing distributed archival storage resources • Effectively infinite storage (first to 1TB wins a t-shirt) • Data replication • Parallel Transfers • Interfaces: command-line, API, web/portal. NVO Summer School Sept 2004

  38. hpss-sdsc sfs-tape-sdsc hpss-caltech workstation Storage Resource Broker (SRB):Virtual Resources, Replication NCSA SDSC SRB Client (cmdline, or API) … NVO Summer School Sept 2004

  39. Allocations Policies • TG resources allocated via the PACI allocations and review process • modeled after NSF process • TG considered as single resource for grid allocations • Different levels of review for different size allocation requests • DAC: up to 10,000 • PRAC/AAB: <200,000 SUs/year • NRAC: 200,000+ SUs/year • Policies/procedures posted at: http://www.paci.org/Allocations.html • Proposal submission through the PACI On-Line Proposal System (POPS) https://pops-submit.paci.org/ minimal review, fast turnaround NVO Summer School Sept 2004

  40. http://www.paci.org Requesting a TeraGrid Allocation NVO Summer School Sept 2004

  41. 24/7 Consulting Support • help@teragrid.org • advanced ticketing system for cross-site support • staffed 24/7 • 866-336-2357, 9-5 Pacific Time • http://news.teragrid.org/ • Extensive experience solving problems for early access users • Networking, compute resources, extensible TeraGrid resources NVO Summer School Sept 2004

  42. Links • www.teragrid.org/userinfo • getting an account • help@teragrid.org • news.teragrid.org • site monitors NVO Summer School Sept 2004

  43. DemoData intensive computing with NVO services

  44. DPOSS flattening Source Target 2650 x 1.1 Gbyte files Cropping borders Quadratic fit and subtract Virtual data NVO Summer School Sept 2004

  45. Driving the Queues for f in os.listdir(inputDirectory): # if the file exists, with the right size and age, then we keep it ofile = outputDirectory +"/"+ f if os.path.exists(ofile): osize = os.path.getsize(ofile) if osize != 1109404800: print " -- wrong target size, remaking", osize else: time_tgt = filetime(ofile) time_src = filetime(file) if time_tgt < time_src: print(" -- target too old or nonexistant, making") else: print " -- already have target file " continue cmd = "qsub flat.sh -v \"FILE=" + f +"\"" print " -- submitting batch job: ", cmd os.system(cmd) Here is the driver that makes and submits jobs NVO Summer School Sept 2004

  46. PBS script A PBS script. Can do "qsub script.sh –v "FILE=f345" #!/bin/sh #PBS -N dposs #PBS -V #PBS -l nodes=1 #PBS -l walltime=1:00:00 cd /home/roy/dposs-flat/flat ./flat \ -infile /pvfs/mydata/source/${FILE}.fits \ -outfile /pvfs/mydata/target/${FILE}.fits \ -chop 0 0 1500 23552 \ -chop 0 0 23552 1500 \ -chop 0 22052 23552 23552 \ -chop 22052 0 23552 23552 \ -chop 18052 0 23552 4000 NVO Summer School Sept 2004

  47. Atlasmakera service-oriented applicationon Teragrid Federated Images: wavelength, time, ... VO Registry SIAP SWarp Hyperatlas source detection average/max subtraction NVO Summer School Sept 2004

  48. TAN projection SIN projection Hyperatlas Standard naming for atlases and pages TM-5-SIN-20 Page 1589 Standard Scales: scale s means 220-s arcseconds per pixel Standard Layout TM-5 layout Standard Projections HV-4 layout NVO Summer School Sept 2004

  49. Hyperatlas is a Service All Pages: <baseURL>/getChart?atlas=TM-5-SIN-20 0 2.77777778E-4 'RA---SIN’ 'DEC--SIN' 0.0 -90.0 1 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 -85.0 2 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 36.0 -85.0 ... 1731 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 288.0 85.0 1732 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 324.0 85.0 1733 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 0.0 90.0 Best Page: <baseURL>/getChart?atlas=TM-5-SIN-20&RA=182&Dec=62 1604 2.77777778E-4 'RA---SIN‘ 'DEC--SIN' 184.61538 60.0 Numbered Page: <baseURL>/getChart?atlas=TM-5-SIN-20&page=1604 1604 2.77777778E-4 'RA---SIN' 'DEC--SIN' 184.61538 60.0 Replicated Implementations baseURL = http://mercury.cacr.caltech.edu:8080/hyperatlas (try services) baseURL = http://virtualsky.org/servlet NVO Summer School Sept 2004

  50. GET services from Python This code uses a service to find the best hyperatlas page for a given sky location hyperatlasURL = self.hyperatlasServer + "/getChart?atlas=" + atlas \ + "&RA=" + str(center1) + "&Dec=" + str(center2) stream = urllib.urlopen(hyperatlasURL) # result is a tab-separated line, so use split() to tokenize tokens = stream.readline().split('\t') print "Using page ", tokens[0], " of atlas ", atlas self.scale = float(tokens[1]) self.CTYPE1 = tokens[2] self.CTYPE2 = tokens[3] rval1 = float(tokens[4]) rval2 = float(tokens[5]) NVO Summer School Sept 2004

More Related