470 likes | 710 Views
The UNC ADCIRC Application on the SURAGrid. Steve Thorpe MCNC thorpe@mcnc.org. SURAGrid Application Workshop February 22, 2006 Washington, D.C. You can grab a copy of these slides by following links from http://www.mcnc.org/thorpe/presentations. Outline.
E N D
The UNC ADCIRC Application on the SURAGrid Steve Thorpe MCNC thorpe@mcnc.org SURAGrid Application Workshop February 22, 2006Washington, D.C. You can grab a copy of these slides by following links from http://www.mcnc.org/thorpe/presentations
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
Motivation: Weather Strikes Daily! • Hurricane Season 2005 • 26 named storms, 14 hurricanes, 3 with major impact • billions of dollars economic losses • We need to … • provide early, accurate, and frequent forecasts, dissemination of information • provide infrastructure to solve inter-disciplinary problems • to be able to interact in real-time i.e. evaluate and adapt • innovate new methodologies for prediction Source: NOAA 2005 … 2006…
Part of SURA’s SCOOP Program • The Southeastern Universities Research Association’s (SURA) Coastal Ocean Observing and Prediction Program • SCOOP goals • improve predictions and mitigate the impact of coastal phenomena such as extra-tropical storms and hurricanes • implement a comprehensive observing system that will validate accurate and timely short- and long-term predictions • open web portal access to basic and analyzed data and linked numerical models, available in real-time • “plug and play” model for next generation From www.openioos.org
SCOOP Participant Institutions • Gulf of Maine Ocean Observing System (GoMOOS) • Louisiana State University • MCNC • National Oceanic and Atmospheric Administration • SURA • Texas A&M University • University of Alabama, Huntsville (UAH) • University of Florida • University of Maryland • University of Miami • University of North Carolina, Chapel Hill / RENCI • Virginia Institute of Marine Science (VIMS)
ADCIRC Circulation Model • Developed by: • Dr. Rick Luettich, UNC Chapel Hill’s Institute of Marine Sciences • Dr. Joannes Westerink, University of Notre Dame’s Department of Civil Engineering and Geologic Sciences • ADCIRC is a finite element method (FEM) shallow water model for computing tidal and storm surge water level and depth averaged currents • Primary model used by the NC SCOOP team
UNC Marine Sciences • Brian Blanton, Rick Luettich, Larry Mason (ITS) • ADCIRC development/implementation • RENaissanceComputingInstitute • Lavanya Ramakrishnan, Brad Viviano, Howard Lander, Dan Reed • Joint institute spanning UNC-CH, NCSU and Duke • National grid efforts such as Linked Environments for Atmospheric Discovery (LEAD), TeraGrid • MCNC • Michael Garvin, Steve Thorpe, Chuck Kesler • Non-profit org, committed to advancing education, innovation and economic development throughout NC by delivering next-generation IT services • Subcontractor to RENCI NC SCOOP Team 7
SCOOP Tasks* • Data Standards • Data Translation, Transport, & Management • Modeling • Configuration Tool Set • Visualization Services • Verification Services • Storage and Computing Services • Security • Grid Management Middleware *NC Team has focused especially in the blue areas
NC SCOOP Project Goals • Large scale model availability via web-based portal • users can access the model easily • enable model runs with different input datasets (e.g. meteorological data) • facilitate model output distribution • Distributed data management for collecting, archiving and providing access to model output products • Model execution in a Grid computing environment • automatically find available computational resources
High Level View of Where ADCIRC Fits In UF Winds Analytical/GFDL SCOOP ARCHIVAL & Visualization LDM LDM OPeNDAP Computational Grid Executes ADCIRC Model OPeNDAP LDM LDM NCEP Winds NAM/POLAR Other Distribution Processing Visualization
NC SCOOP V1 - Hindcast Data Flow 1 2 3 4 4 1 3 2 Daily Model Runs OPeNDAP Server NCEP Manually specify model run parameters in Make tarball of needed Archived Files Third-party transfer between Portal host and Compute host Execution of requested simulation on Compute host RENCI/UNC MyProxy MCNC Grid Portal Globus Gatekeeper LSF Queue GridFTP To UAH LDM Globus Gatekeeper GridFTP GridFTP Globus Gatekeeper Mass Storage UNC Production System UNC Experimental SCOOP Machine NFS Mounted
NC SCOOP Portal (1/4) GridFTP Transfer Data Access via OPeNDAP Models (ADCIRC)
NC SCOOP Portal (2/4)OPeNDAP Access • Access to operational (daily) ADCIRC output. • Global Elevation • Global Velocity • OPeNDAP • LDM
NC SCOOP Portal (3/4)Hindcast Submit Compute Job Set Run Dates Hurricane Ivan Current ADCIRC grid 16 CPU Decomposition
NC SCOOP Portal (4/4)Solution Display • 14 day sim • 16 cpu • 8 min • Note: storm • surge and tides • are reflected in • the water levels. Hurricane Ivan
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
NCEP and WANAF Wind Ensemble UFL-SCOOP Analytic Forecast Storm Tracks for Forecast with Initial time = 2005071800 NCEP Ensemble Perturbation Forecast Storm Tracks for Forecast with Initial time = 2005071800 Hurricane Emily Need lots of compute resources, as each wind forecast can drive a different ADCIRC simulation. We refer to a set of such runs as an ensemble.
2 ADCIRC Input Wind Predictions for Tropical Storm Arlene Standard NCEP model NAH, NCEP’s Hurricane model The NAH model’s improved results can substantially effect the skill of ADCIRC’s storm surge forecast.
Automated Execution ADCIRC automatically triggered upon arrival of input data sets User Interface Visualization OpenIOOS Translation ADCIRC Model Verification / Validation Data Management Archive Services Catalog Services Archive Services Archive Services Model Results Application Env. Observations Winds Catalog Services Catalog Services Resource Selection Resource Access Layer Since jobs are not manually initiated in this scenario, we need grid technologies to help find compute resources to run the jobs.
Benefits of Grid Enabling ADCIRC • Take advantage of “compute on demand” cycles provided by grids • Greater value of prediction results since get them quicker • Allows us to run more models and at higher resolutions • Can adjust compute location and model configuration dynamically based on: • Compute load on available resources • Number of CPUs available • Resolution of the model’s finite element mesh • Number of ensemble members • etc.
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
NC SCOOP Team Has Developed a Storm Surge Ensemble Prediction System ADCIRC computes coastal water levels for each ensemble member across the SCOOP grid, with jobs migrating to available resources Compute forecast wind ensemble Create tars of winds, mesh, ICs Hurricane Forecast track winds UNC National Hurricane Center University of Florida UNC, MCNC, UAH, UFL, LSU, VIMS, ….. SURAGrid UNC Ensemble solutions Develop and publish water level ensemble forecast
Ensemble System Can Improve Forecast Results Left: ADCIRC max water level for 72 hr Hurricane Katrina forecast starting 29 Aug 2005,driven by the "usual, always-available” ETA winds. Right: ADCIRC max water level over ALL of UFL ensemble wind fields for 72 hr Hurricane Katrina forecast starting 29 Aug 2005, driven by “UFL always-available” ETA winds. Images credit: Brian O. Blanton, Dept of Marine Sciences, UNC Chapel Hill
Real-Time Resource Selection API • We developed a simple Java API • We want to use more than just MCNC and UNC resources for the multiple runs required by the ensemble system • The API answers the question “What is the best place for me to run this job, right now?” • Bases choice on current availability of compute and data resources • Allows arbitrary ranking algorithm(s) through user-supplied Java plug-ins • Uses Java CoG kit + the usual GT 3.2.1 tools: • MDS for information sharing • GridFTP for file transfer • PreWS GRAM for job submission
Real-Time Resource Selection MDS MDS MDS Gatekeeper Gatekeeper Gatekeeper GridFTP GridFTP GridFTP Given a set of remote resources, what is the best one for me to run on, right now?? Policy … Resource Chooser Meta-scheduling Monitoring else If MDS available Run a probe job to find no of cpus and rough estimate on time How many resources can I get (queue)? Are services up?
Additional Thoughts on the Resource Chooser • Considered other meta scheduler technologies • Too complex, proprietary components, non-java, etc…. • The basics that come with GT 3.2.1 plus our simple API work for us • Future possible directions for API • Speed up resource choosing by thread enabling • Add additional plug-ins to take into account job priority, additional resource characteristics (e.g. CPU speed, prior reliability…) • Apply to different models other than ADCIRC • Revisit whether to use other meta scheduler technologies?
SCOOP Partner Grid Resources Users • We’ve established grid resources at MCNC and RENCI • GT 3.2.1 based • Connected to non-NC SCOOP partner resources also • Not always easy! RENCI Portal – www.scoop.unc.edu scoop.ncgrid.org GridFTP, MDS, GRAM dante1.renci.org GridFTP, MDS, GRAM LSF PBS …….. 16 TB Storage Mass Storage System cluster cluster … UAH … UFL … LSU … VIMS … MCNC UNC
Grid Testbed Experiences UNC, MCNC UAHb TAMU UF LSU • Components at every site • Globus gatekeeper, GridFTP, PBS/LSF, MDS • Globus setup at compute sites • firewall problems • CA trust problems • “old” style cert problems • MDS not set up • Job manager (PBS or LSF) not set up • Tools sometimes not installed (e.g. uudecode) • This is NOT easy, it takes a LOT of effort!
SURAGrid Partner Grid Resources • Initial progress has been made at: • Louisiana State University (Tevfik Kosar, Harmat Kaiser, Gabrielle Allen) • Texas A&M University (Steve Johnson) • University of Alabama Huntsville (Sandi Redman) • University of Kentucky (Vikram Gazula) • University of Southern California (Nirmal Seenu) • TACC (Ashok Adiga) • Have I missed any others? • Status is in various stages: • The “machine ordering” stage • The “gt3 installation” stage • The “configure scheduler” stage • The “gt3 connectivity/Firewall problem resolution” stage
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
Next Steps Toward SURAGrid Deployment (1 of 2) • Ensure your site has met the basic requirements (tomorrow, day 2 of the workshop, and beyond if necessary) • Howard, Steve, and Lavanya will then work to test ADCIRC on your system: • First, “standalone” from the command line • Next, through the Globus Gatekeeper and using the resource choosing API • This assumes you’ve first completed the steps outlined in http://www.ccs.uky.edu/SCOOP (see upcoming slides)
Next Steps Toward SURAGrid Deployment (2 of 2) • If/when the testing is successful, your resource(s) will be added to those chosen from for the regular ADCIRC runs • We hope to have further testing done during the next couple weeks to 2 month time frame • If all goes well, there will be many more CPUs available in time for hurricane season (June 1)!
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
Ensure Your Site Has Met The Basic Requirements for ADCIRC (1 of 2) • Pre-Web Services Versions of the Globus Toolkit services • We presume you’re using GT 3.2.1; although in theory GT versions 2.4 -> 4.x should work • See www.globus.org • You need these three Globus services: • gridFTP server for file transfer • GRAM server for job submission • MDS for information sharing
Ensure Your Site Has Met The Basic Requirements for ADCIRC (2 of 2) • Also you’ll need: • RENCI and MCNC’s installations will need to trust your CA • Your installation will need to trust RENCI and MCNC’s CAs • Back end queuing system such as LSF or PBS • Queuing system should have an mpich-based mpirun behind it • Adapter (this allows Globus to submit jobs to the back-end queuing system, and to publish information about the queuing system through MDS) • Linux x86-based system • Preferably a cluster (a single node almost definitely would never be chosen for an ADCIRC run)
Follow the steps in the URL http://www.ccs.uky.edu/SCOOP • This site has some (basic) setup instructions • Thanks to University of Kentucky’s Vikram Gazula for this web space! • Letting your GT installation trust the RENCI and MCNC certificate authorities • Account setup for Lavanya, Howard, and Steve • Also their /etc/grid-security/grid-mapfile entries • Setting up Globus with your back end scheduler • How to test MDS using grid-info-search • Should get back an “Mds-Computer-Total-Free-nodeCount” entry when doing a “grid-info-search” • testHosts.sh, a script that can do limited checks of your GRAM, MDS, and GridFTP installation
Setup Related Questions • Before contacting us, have you checked the web page to see if your answer is buried somewhere within? • http://www.ccs.uky.edu/SCOOP • Recommend email to “scoop-support at renci dot org” as a next try • Howard Lander, Steve Thorpe, and Lavanya Ramakrishnan are on this list. We’ll try to check email frequently tomorrow. • If that doesn’t produce a satisfactory response, you can try IM’ing us (could try to set up a group chat). Our AIM addresses are • Steve: thorpe682 • Howard: howardlander • Lavanya: lavanyaRamakrish
Last Resort • If you’re still striking out, you can try calling Howard and/or Steve: • Howard’s office: (919) 445-9651 • Steve’s home office: (610) 866-3286 • Note: I usually pick up only after I hear a friendly voice on the machine! • Steve’s cell: (919) 724-9654
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
Future NC SCOOP Plans (1 of 2) • Resource Chooser API extensions • Threading to speed up choosing • Additional plug-ins (CPU speed, reliability, data location, priorities based on urgency…) • Apply to different models, not just to ADCIRC • Improve fault tolerance of ADCIRC workflow • e.g. if high priority, submit same job multiple times • Or if job fails, restart ASAP
Future NC SCOOP Plans (2 of 2) • Better integration with other SCOOP efforts • UFL’s Virtual Cluster setup • Data management • Catalog and archive access • “Coupling” different models (e.g. ADCIRC and SWAN) • Model verification activities
Outline • What is the UNC ADCIRC Application? • Why ADCIRC is Important to Grid-Enable • Current Status of Grid-Enabling • Next Steps Towards SURAGrid Deployment • Steps for Day 2 • Future Plans • Conclusions
Concluding Thoughts • Can be time consuming to establish grid connectivity among organizations… • Configuration of firewalls, NAT, GlobusToolkit, Certificate Authorities, job scheduler adapters • Systems administrator nervousness • etc….. • …but it’s worth it! • SCOOP partners are creating a distributed, scalable, modular resource that empowers scientists at multiple institutions • This will advance the science of prediction of surge & wave impacts during storms, and other environmental hazards
Special Thanks … to Brian Blanton and Lavanya Ramakrishnan for their help with these slides! … to Philip Bogden, Joanne Bintz, Mary Fran Yafchak and others from SURA for their SCOOP project leadership … to Art Vandenberg for his tireless Gridification efforts … to you, the SURAGridsters who are generously sharing your resources!!
Questions? Also, don’t forget about scoop-support at renci dot org