100 likes | 197 Views
Simulation Production at UTD. Shuwei YE, UT-Dallas DOE Visit, Nov. 14, 2003. Outline. Upgrade on computing farm ( see Xinchou’s talk ) SP4 SP5 migration Installation Trouble shooting Fine tuning Operation challenges UTD Production rate.
E N D
Simulation Production at UTD Shuwei YE, UT-Dallas DOE Visit, Nov. 14, 2003
Outline • Upgrade on computing farm ( see Xinchou’s talk ) • SP4 SP5 migration • Installation • Trouble shooting • Fine tuning • Operation challenges • UTD Production rate
Computing Farm Upgrade Original system: • 16 dual-CPU (P-III 1.0G), 512M-1G memory/node • 500G RAID + 1.5T soft RAID • Capacity: 5-6 MilloEvents/month Current system: + 64 dual-CPU (P4 2.66G), 1G memory/node + 1.8T RAID • Designed capacity: 30 MilloEvents/month
SPSP5 Migration: Installation • OS : RedHat-6.2 7.2 • PBS installation • Objy-6.2 7.1 • AFS, CERNLIb, Perl, tcl, CVS, ROOT … • SP5 installation and validation
SPSP5 Migration: Trouble Shooting • AFS behind campus firewall • Non-standard software in SP: • Downgrade compiler gcc, libtcl, perl modules • Unrecognizable NFS in Objy: – insecure export • bbftp behind firewall: –passive mode, bug fixed
SPSP5 Migration: Fine Tuning • Hyper-thread testing • Condition and bkg dbs in Objy set read-only • high RPC failure rate – improved by release upgrade • Automatic job submission improvement – more smart, benefit to all sites • Weekend monitor (benefit of laptop)
Hardware Problems • A/C problem (addressed in Xinchou’s talk) • Old RAID problem (spare disks available) • Rare unexpected power outage (could damage databases)
Production RateBefore upgrade Total by UTD: 66.5M
Production RateAfter upgrade Total by UTD: 111.7M
Summary MilloEvents/day after upgrade: Average: 30-40 MilloEvents/month