90 likes | 168 Views
NCSA TG RP Update 1Q07. CSE-Online Science Gateway. Production Date: Mar 9, 2007 Developed under ITR program DAC Community Allocation MRAC Community Allocation just awarded Dedicated 4 nodes on Mercury Results from first 30 days (next slide) – Gaussian jobs running in restricted shell
E N D
NCSA TG RP Update1Q07 TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
CSE-Online Science Gateway • Production Date: Mar 9, 2007 • Developed under ITR program • DAC Community Allocation • MRAC Community Allocation just awarded • Dedicated 4 nodes on Mercury • Results from first 30 days (next slide) – Gaussian jobs running in restricted shell • Changing reservation to 1 node based on results, will continue to monitor usage TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
CSE Online Utilization Dedicated 4 nodes initially, now one node Goal: improved turnaround for a large number of small jobs submitted through the gateway. TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
LEAD Science Gateway • Supported Spring Weather Challenge (www.wxchallenge.com) forecasting contest for undergraduate atmospheric science students • Feb 19-26: daily testing, 80 processors, 12pm-5pm • Feb 26-April 27th: 160 processors; 12pm-5pm Monday through Thursday. • Actual contest submissions started week of March 26 TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
LEAD Gateway Statistics • 250 jobs per week, consuming 1800 SUs/week • Each workflow is 5 jobs – • 250 jobs corresponds to 50 workflows • Expect this to increase once issues are resolved/reliability improves • LEAD Gateway typically the most or 2nd most active gateway in terms of resources used • (BIRN or GridChem are often ahead) TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
Issues Uncovered by both Science Gateways • Remote job submission – great when jobs run – hard to know problems – even simple things such as planned downtime • Reservation Issues – can’t overflow end of reservation when many jobs stack up (LEAD) • If user assigns an obsolete project, don’t get useful error message back • GridFTP striped server – one fails, all fail TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
SG Next Steps • Meetings with teams to understand usage modes and issues • CSE Online • NCSA contingent visiting CSE Online group at Univ of Utah Apr 23 – 25 • LEAD • NCSA and IU RP’s setting up a date to visit LEAD group at IU TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
Abe: 1955 blade cluster 2.33 GHz Cloverton Quad-Core 1,200 blades/9,600 cores 89.5 TF; 9.6 TB RAM; 120 TB disk Perceus management; diskless boot Cisco Infiniband 2 to 1 oversubscribed Lustre over IB 8.4GB/s sustained Power/Cooling 500 KW / 140 tons TG Software deployment CTSS Inca Production date: May 2007 (anticipated) User Environment Torque/Moab Softenv Intel Compiler MPI: evaluating Intel MPI, MPICH, MVAPICH, VMI-2, etc. New Resource - Abe TG Quarterly Meeting Breckenridge, CO Apr 11, 2007
March Allocations • 25.1 M SUs (672M NUs) awarded to NCSA systems • 34% of allocated resources • Several large supplements coming in after the meeting • Several 1M+ SU allocations @ NCSA • Silas Beane: 2.0M on Tungsten • Ali Uzun: 2.0M on Abe • Adrian Roitberg: 1.5M on Abe • Thom Cheatham: 1.0M on Abe TG Quarterly Meeting Breckenridge, CO Apr 11, 2007