90 likes | 227 Views
Pittsburgh Supercomputing Center RP Update July 16, 2009. Bob Stock Associate Director stock@psc.edu. Center for Analysis & Prediction of Storms. Oklahoma/NOAA Spring Severe Weather Forecast Experiment for 2009 CAPS used NICS (1 km) and PSC (4 km) At PSC from 4/20 to 6/5
E N D
Pittsburgh Supercomputing CenterRP Update July 16, 2009 Bob Stock Associate Director stock@psc.edu
Center for Analysis & Prediction of Storms • Oklahoma/NOAA Spring Severe Weather Forecast Experiment for 2009 • CAPS used NICS (1 km) and PSC (4 km) • At PSC • from 4/20 to 6/5 • Sunday-Thursday: reservations of 2000 cores for 10-12 hours starting at 10:30 a.m. (eastern) • Lots of data generated: E.g., 66 terabytes ingested into archive during May
2009 CAPS Spring Experiment on PSC BigBen • Data Access and Screening • Create Input Files • Create Job Scripts • Remap Radar Data [800 proc, 20 proc each radar] • Process Initial and Boundary Conditions • Run Weather Analysis [80 processors] • Create Ensemble Perturbations • Run WRF & ARPS Forecast Models [18 x 80 processors] • Extraction & reformatting of 2-D output • Archive of 3-D results, over 50 TB data • Generate derived products • Data display and interrogation • Analysis and verification • Publication
Sample 4-km Ensemble Forecast Products Actual Observed Radar Reflectivity Predicted Spaghetti Diagram of 35 dBZreflectivity Predicted Probability of reflectivity >35 dBZ Predicted Probability matched reflectivity Midwest Zoom All Ensemble Forecast Members 18h Forecasts Valid 1800 UTC, May 8, 2009
Enhancing Operations on Pople Automatic Performance Measurement Utilize Performance Monitor Unit (PMU) Backfilling using Predictive Walltimes
Automatic Performance Measurement • Goal: Collect Intel Itanium 2 PMU stats for each job in order to • Identify underperforming codes (MFLOPS) • Provide users with PMU stats for their runs • Based on open source package: Perfmon2 • http://perfmon2.sourceforge.net/ • Collection started for each job using pfmon • Counters collected: CPU_OP_CYCLES_ALL, FP_OPS_RETIRED, L3_REFERENCES, L3_MISSES • Counter detail for each process and thread collected • Report issued from digested stats • Currently testing and evaluating load on system
Backfilling using Predictive Walltimes Goal: Maximize backfilling during drain for larger jobs Problem: Backfilling for large jobs idles machine due to users overestimating job run times Solution: Store estimated and actual job run times for each job and statistically predict job run times Statistically calculated run time is used to optimize backfilling opportunities Database used to store job actual and estimated walltimes for each job Lightweight database engine, SQLite, used to store data 70,000 jobs in database Database uses only 87Kbytes! Scheduler uses data from database to select jobs for backfill Still studying impact and benefits – shows promise
PSC at TG09: Organization • Shawn Brown: Science Track Co-Chair • Pallavi Ishwad: EOT Track Chair • Laura McGinnis: Student Program Chair • Shandra Williams: Communications Committee Member in charge of signage • Mike Schneider: Wrote news items about the conference
PSC at TG09: Participation • Phil Blood and Robin Flaus: Presented paper on Computation Exploration (Comp Ex) program in EOT Track • Greg Foss: Presented visualizations in Visualization Showcase • Ed Hanna and Rob Light with Dave Hart (SDSC): Presented paper on RDR in Technology Track • Anirban Jana and Sergiu Sanielevici with several people from other institutions: Presented tutorial Preparing Your Application for TeraGrid Beyond 2010 • Nick Nystrom with several people from other institutions: Presented tutorial Using Tools to Understand Performance Issues on TeraGrid Machines: IPM and the POINT Project • Josephine Palencia: Presented poster JWAN: PSC's Secure, Federated, Distributed Lustre Filesystem on the WAN (TeraGrid)