220 likes | 234 Views
Comprehensive overview of AGLT2 site hardware, networking, storage, and software access at the USATLAS Tier-2 meeting in 2007, detailing resource allocation, rack layout, and performance monitoring. Includes a summary of system composition, job scheduling policies, and available monitoring tools.
E N D
ATLAS Great Lakes Tier-2 Site Report Bob Ball USATLAS Tier2 Meeting Bloomington, IN June 22nd, 2007
AGLT2 Overview • Dedicated hardware for AGLT2 ran first jobs in April, 2007 • Cycles from mix of dual quad-core Xeons and dual dual-core Opterons • 236 cores available at peak • MSU is upgrading their computing space and is acquiring some T3 equipment which they will use to help prototype T2 purchases. • Network equipment ordered with on-time delivery expected • Expect space to be ready ~August-September, 2007 • Population with disk and compute servers will follow quickly • Primary disk storage to 3 servers plus dCache plus “reserve” disk • Network connectivity via dual 10Gb links over MiLR • aglt2.org address space encompasses both MSU and UM sites • With MSU online, panda submission will transparently distribute across both Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
UM Cluster Composition • Compute nodes • 26 dual quad-core Xeons in Dell 1950, 16GB RAM, 2 750GB disks • 20 Tier-2 priority, 6 muon calibration priority • 10 dual dual-core Opteron 285, 8GB RAM, 4 250GB disks • All with Tier-3 priority, but…. • Per MOU, Tier-2 uses half of this resource • Prioritized access handled by condor configuration • Dual Gb NICs (1 private, 1 public) • Disk Storage • 21TB Raid50 in Dell 2950 chassis with dual MD1000 enclosures • 11TB Raid6 • 18TB dCache (non-resilient) • 9.5TB dCache (resilient) • 7.5TB for muon calibration files • 16TB Raid6 held in reserve • Dual NICs (1Gb private, 10Gb public) Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
More Composition • High speed network within server room • Dell 6248 switch stack • Gb connectivity for compute nodes and public access to disk servers • 10Gb “private” network access to NFS servers from compute nodes • 10Gb stack access to Cisco router for MiLR WAN • Remote switch management • Raritan KVM for console access to non-Dell machines • DRAC access to Dell consoles Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
UM Rack Layout ~September 2007 May 2007 Tier-2 August 2007 Tier-3 August 2007 Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Software Access • pacman kit installation to afs space • Mirror for Tier-3 access elsewhere • OSG installed to afs where read-only access required • OSG 0.6.0 with VDT 1.6.1i • Soft links back to local disk for logs and needed write access • Gatekeeper is gate01.aglt2.org • Condor v6.8.5 collector located on cluster head node umopt1 • gate02.aglt2.org is backup gatekeeper • OS is SLC4.4 or 4.5 • Kernel is 2.6.20-10 (UltraLight) because of NFS issues in earlier versions • 64-bit kernel with 32-bit compatibility libraries for code Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Resource Allocation • We use Condor 6.8.5 job scheduler. Policy on Tier-2 is: • Highest priority for usatlas2 (admin, software install) • Next highest priority for usatlas1 (ATLAS production) • Next highest priority for usatlas3 (US ATLAS users) • Lowest priority for OSG (including usatlas4) • The Tier-3 is “integrated” with the Tier-2 provisioning (ROCKS) but has different priorities reflecting 50% contribution to the Tier-2. • If either Tier-2 or Tier-3 is idle, the other expands to fill • Equilibrium when all is busy at about 80% of job slots to Tier-2 • OSG/usatlas4 limited to, at most, 16 cores Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Sample condor configuration # On a compute node T2_HIGH_GROUP = (Owner == "usatlas2") || (Owner == "osg") || (Owner == "mis") T2_MID_GROUP = (Owner == "usatlas1") …. # Exclude usatlas4 and non-ATLAS VOs from running here. Allow all else. START = $(NT2_VLOW_GROUP) ======================================================== # On cluster head node, the Condor Negotiator knows about group quotas GROUP_NAMES = group_osg, group_local GROUP_QUOTA_group_osg = 204 GROUP_QUOTA_group_local = 50 GROUP_AUTOREGROUP = TRUE ======================================================== # On OSG gateway, automagically assign the correct accounting group AcGrp = group_osg AccountingGroup = “$(AcGrp)” SUBMIT_EXPRS = AccountingGroup Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Local Monitors • Environmental • APC for power, temp, humidity • Ganglia for overall, fast view of performance • Perl script for details of running processes and immediate log access • Hardware health and performance via snmp/cacti • OS logs to central, searchable syslog-ng server • Email notification of most “out of range” violations • Following is a sampling of available plots Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Ganglia and Perl Job Monitors Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Cacti Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Syslog-ng Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Performance – AGLT2 Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Problems • Non ATLAS-wide AGLT2 outages from variety of sources • Gate-keeper crash • Had a few of these • Head node crash • Condor collector is here • Network outage • Two unexpected instances of this (plus one scheduled) • Disk crash, or NFS server crash • Disruptive, but not fatal. No disk crashes (so far) • Software configuration • Missing files, usually quickly resolved • Local OSG/VDT mis-configurations (most recent outage) Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Performance Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Performance Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Summary • AGLT2 is up and running • Occasional, non-serious problems common to all • Some are AGLT2 specific, but sporadic and fixable • OSG/VDT issues since start of June • Should be entirely resolved soon • Temporary use of local pilots as work around • Lots of monitors allowing quick response to problems • Expansion scheduled for early Fall 2007 • MSU systems will come on line as well Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Supplementary Slides Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Current Equipment • We want to purchase identical equipment for AGLT2, independent of its physical location (UM or MSU) • UM went out for bids in January. Best proposal from Dell: • Compute nodes: Dell 1950 dual quad-core Intel systems 2.66 GHz, 16GB (2GB/core), two 750GB disks about $6.2K/node. Ordered 20 • Storage node: Dell 2950 with two MD1000 shelves (see next slide). Delivers 21TB RAID50 for around $15.4K. Ordered 1 • Network switches: Dell PowerConnect 6248 has stacking modules (48Gbps) 48 copper 10/100/1000 ports and two 10 gig module slots. Cost for 3 units with all cables, stacking modules and 4 10G CX-4 ports $10.3K Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
New Storage Node This system (with a single Perc 5/E RAID card) performs around 750 MB/sec. We have not confirmed another net posting that claimed more than 1.2 GB/sec with two Perc 5/E cards. Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Network Architecture • Good pricing from Dell for access layer switches: • Managed with 10GE ports, lots of 10/100/1000 copper ports • Good QoS and layer 3 capabilities, redundant power supply Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg
Capacity Plans • The numbers for the AGLT2 (revised in January) are: • The plan was based upon dual dual-core Opteron 285s (1714 SI2K each core) and about $1K/TB of disk storage • Spent $139K on Dell systems in February. Got 20 x 8 x 2150 SI2K = 344 kSI2K and 21TB of disk. (Total with T3 contrib. is around 444 kSI2K, 50TB) • August 2007 UM/MSU combined will spend around $350K on storage/compute: • Seven storage nodes ~ 196 TB, 38 compute nodes for ~ 730 kSI2K additional CPU Bob Ball - AGLT2 - USATLAS-Tier2-Tier3 Mtg