1 / 9

NCSA TG RP Update 1Q07

NCSA TG RP Update 1Q07. CSE-Online Science Gateway. Production Date: Mar 9, 2007 Developed under ITR program DAC Community Allocation MRAC Community Allocation just awarded Dedicated 4 nodes on Mercury Results from first 30 days (next slide) – Gaussian jobs running in restricted shell

dorie
Download Presentation

NCSA TG RP Update 1Q07

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NCSA TG RP Update1Q07 TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  2. CSE-Online Science Gateway • Production Date: Mar 9, 2007 • Developed under ITR program • DAC Community Allocation • MRAC Community Allocation just awarded • Dedicated 4 nodes on Mercury • Results from first 30 days (next slide) – Gaussian jobs running in restricted shell • Changing reservation to 1 node based on results, will continue to monitor usage TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  3. CSE Online Utilization Dedicated 4 nodes initially, now one node Goal: improved turnaround for a large number of small jobs submitted through the gateway. TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  4. LEAD Science Gateway • Supported Spring Weather Challenge (www.wxchallenge.com) forecasting contest for undergraduate atmospheric science students • Feb 19-26: daily testing, 80 processors, 12pm-5pm • Feb 26-April 27th: 160 processors; 12pm-5pm Monday through Thursday. • Actual contest submissions started week of March 26 TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  5. LEAD Gateway Statistics • 250 jobs per week, consuming 1800 SUs/week • Each workflow is 5 jobs – • 250 jobs corresponds to 50 workflows • Expect this to increase once issues are resolved/reliability improves • LEAD Gateway typically the most or 2nd most active gateway in terms of resources used • (BIRN or GridChem are often ahead) TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  6. Issues Uncovered by both Science Gateways • Remote job submission – great when jobs run – hard to know problems – even simple things such as planned downtime • Reservation Issues – can’t overflow end of reservation when many jobs stack up (LEAD) • If user assigns an obsolete project, don’t get useful error message back • GridFTP striped server – one fails, all fail TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  7. SG Next Steps • Meetings with teams to understand usage modes and issues • CSE Online • NCSA contingent visiting CSE Online group at Univ of Utah Apr 23 – 25 • LEAD • NCSA and IU RP’s setting up a date to visit LEAD group at IU TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  8. Abe: 1955 blade cluster 2.33 GHz Cloverton Quad-Core 1,200 blades/9,600 cores 89.5 TF; 9.6 TB RAM; 120 TB disk Perceus management; diskless boot Cisco Infiniband 2 to 1 oversubscribed Lustre over IB 8.4GB/s sustained Power/Cooling 500 KW / 140 tons TG Software deployment CTSS Inca Production date: May 2007 (anticipated) User Environment Torque/Moab Softenv Intel Compiler MPI: evaluating Intel MPI, MPICH, MVAPICH, VMI-2, etc. New Resource - Abe TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

  9. March Allocations • 25.1 M SUs (672M NUs) awarded to NCSA systems • 34% of allocated resources • Several large supplements coming in after the meeting • Several 1M+ SU allocations @ NCSA • Silas Beane: 2.0M on Tungsten • Ali Uzun: 2.0M on Abe • Adrian Roitberg: 1.5M on Abe • Thom Cheatham: 1.0M on Abe TG Quarterly Meeting Breckenridge, CO Apr 11, 2007

More Related