270 likes | 286 Views
Detailed report on AGLT2 site operations, upcoming acquisitions, current staff overview, and design principles. Includes network setup, equipment details, and ongoing projects.
E N D
ATLAS Great Lakes Tier-2 Site Report Shawn McKee USATLAS Tier2 Meeting, UTA December 8th, 2006
AGLT2 Overview • AGLT2 Cycles are coming from 59 dedicated CPUs (see below) • UM has a significant amount of equipment (mostly T3) already in place. • MSU is upgrading their computing space and will acquire some T3 equipment which they will use to help prototype T2 purchases. • UM currently housed in physics, mostly Athlon/Opteron systems --- combination of i386 (32 bit) and x86_64 (64 bit, 32 bit compatibility) systems running RHEL4 clone (Scientific Linux – SLC4.3 or 4.4) • Have 29.5 “Node” years of Opteron time on Michigan’s Center for Advanced Computing’s nyx cluster. Have requested dedicated allocation of nodes (2 proc/node) till it expires on April 9th, 2007. • The Nyx cluster is our primary Tier2 contribution till we purchase Tier2 equipment sometime between now and March 2007.
AGLT2 Personnel & New Hires • UM has Bob Ball (1 FTE) and Shawn McKee • We have had four different undergraduates (all aerospace juniors or seniors) working hourly…currently only one working ~<8 hours/week • We are exploring two possible Tier2 hires. A junior technician and a senior cluster/systems manager. Jobs posted next week…hires for the new year? • MSU has Tom Rockwell, Philippe Laurens, Mike Nila and Chip Brock • Tom Rockwell…(will be about 3/4 time) • experienced systems manager with physics background: hardware and systems software • Philippe Laurens…(will be about 1/2 time) • talented electronics and software engineer: • QoS and monitoring, possibly resource balancing • Mike Nila…(will be about 1/2 time) • HEP group technician and networking specialist: hardware servicing
AGLT2 Design Principles • Use dual dual-core Opteron systems (unless Woodcrest tests show otherwise): • Good CPU power per electrical power (heat). Scalable: 4 cores -> 4 times the thruput • Very good IO capabilities • Good pricing • Requires x86_64 to be most effective, works with 32 bit though • Incorporate management and monitoring from the start • IPMI cards (or equivalent) and control network • SNMP, syslog, hardware and performance monitoring as part of provisioning • Design for two nets: Public and Private(Data/Management) and provide excellent network connectivity (1 GE workers and 10 GE capable servers and WAN) • Support 10GE networks as an option for “data pumps” and storage servers. • Utilize 2.6 kernels and SLC4 (perhaps a bit premature!) • Build inexpensive high-capacity storage systems with large partitions based upon XFS (chosen by FNAL and CERN) with large file-system support in 2.6 kernels.
10GE Protected Network • We will have a single “/23” network for the AGL-Tier2 • Internally each site (UM/MSU) will have a /24 • Our network will have 3 10GE wavelengths on MiLR in a “triangle” • Loss of any of the 3 waves doesn’t impact connectivity for both sites
Network Details (Almost Operational) Addresses are in aglt2.org
Official BWC Hyper BWC
SC06: Full Speed with FDT • Use of full 10GE bandwith Florida<->Michigan during SC06
Current UMROCKS Cluster • We have a 5 rack AMD Athlon cluster with 70 operational nodes (2000/2400/2600 dual processor, 2 GB RAM) • Two 100+GB disks • Plan to have ~90 nodes operational • ROCKS V4.1
Ganglia Info on UMROCKS(Athlon) Cluster • Currently ~71 operational nodes – plan for ~90 from parts • Used for Alignment and Calibration work as well as Lambda-B studies
Non-Production Work • Two tasks at UM not USATLAS Prod • Muon Calibration and Alignment Center • Gaining experience in athena installations • 12.0.31 and multi kits now successfully installed • Athena tests successfully run • Participating in Calibration “closed loop” test in January 2007 • Targeting CSC analysis (muon ID) as well • Installing dq2_get and related tools • Gaining experience needed for Tier2 work • AFS, NFS, network tuning/configuration and problem resolution • Systems monitoring, configuration and testing • ROCKS configuration and deployment • dCache/Storage deployment
Existing Servers/Services (Mostly T3) • In addition to the UMROCKS cluster we have a number of servers/services: • Two gatekeepers: Dual Xeon 3.6 (2MB cache), 4GB RAM, Intel SE7520AF2 motherboards, IMM card (IPMI) called gate01/gate02.grid.umich.edu • AFS Cell atlas.umich.edu hosted on linat02/linat03/linat04.grid.umich.edu with new file servers linat06/linat07/linat08/atums1/atums2 (about 6TB) • NFS data servers umfs01-04/linat09/linat10/linat11 hosting about 55TB total • Hypnos.grid.umich.edu is dCache headnode for UMROCKS • Have MonALISA node at ml-um.ultralight.org and other monitoring services • Oracle server on one of the “prototype” systems for Calibration/Alignment DB replication • DQ2 is running on umfs02.grid.umich.edu • Planned additional servers: NDT node, GridFTP, DQ2
Cacti Graphing/Monitoring • Cacti use SNMP to easily monitor/graph our hosts
Central Logging • We have found it very useful to create a central logging service for all our equipment. One node dedicated to this task • This node runs ‘syslog-ng’, stores into MySQL and makes the results available via php-syslog-ng on the web:
OSG & ATLAS Software Status/Plans • As shown we have both AFS and NFS storage at our Tier-2 • We plan to install software on AFS (good for readonly type data). OSG (0.4.1) and ATLAS software already in AFS (/afs/atlas.umich.edu) • ATLAS software is mirrored via Pacman on our AFS cell at: http://gate01.grid.umich.edu/am-UM/ATLAS.mirror • All users have their home space in AFS. Our system is setup to get Kereberos TGT (and AFS Tokens) at login via gssklog (instructions on TWiki) • All OSG accounts created with “uniqname” IDs
Planning for Equipment • We want to purchase identical equipment for AGLT2, independent of its physical location (UM or MSU) • UM has used RackMountPro and purchased dual dual-core Opteron systems and SATA-II RAID6 storage servers (1U, 4U and 5U) to date. Servers cost ~$18K/18TB (raw) • Sun has expressed an interest and has provided quotes • Intel is trying to get us a Woodcrest server for evaluation • Need to minimize power/heat while meeting CPU/storage requirements in proposal. Cost is also a factor…
Prototype Building Block (T3) Details • We have purchased 5 dual dual-core Opteron 280 systems and an NFS storage server to test with. • Worker nodes are using Supermicro H8DAR-T(1U) motherboards (AMD 8132 chipset), 4GB of RAM, dual dual-core Opteron 280, three 250GB SATA-II hot-swappable drives, CDROM (4 cores/1U) • Disk server is a dual dual-core Opteron 280, 5U, 24 SATA-II (500GB) drives, dual 250GB system disks, 8GB of RAM, dual 1GE NICs, Areca 1170 RAID6 controller (11TB) • New purchase, 8 worker nodes, 3 interactive and 3 NFS servers: • Worker and interactive nodes are using Supermicro H8DAR-T(1U) motherboards (AMD 8132 chipset), 8GB of RAM, dual dual-core Opteron 285, four 500GB SATA-II hot-swappable drives, DVDROM (4 cores/1U) • Two disk servers are dual dual-core Opteron 285, 5U, 24 SATA-II (750GB) drives, dual 250GB system disks, 12GB of RAM, dual 1GE NICs, Areca 1280 RAID6 controller (16.5TB) (Good performance…see slide below on benchmarking) • One disk server has 16 500GB disks, 16 port RAID6, 16GB RAM, dual Opteron 285s.
Use ARECA Arc-1280 RAID6 ctrl Employed Intel I/O processor 24 Seagate 750GB SATA-II disks System has dual dual-core Opteron 285s with 12 GB of RAM Uses Tyan S2892 motherboard Seq. read/write throughput test by Kyu Park (University of Florida) as part of UltraLight work Shows very good read and write performance Achieved ~780 Mbytes/sec read Achieved ~560 Mbytes/sec write AGLT2 Storage Server IOZONE Tests Best at queue_depth (256), nr_requests(1024), readahead(4MB/2MB)
Prototype Opteron Cluster • Testbed for dual dual-core systems (Opteron 280s/285s, 4-12GB ram) • Supporting Muon combined performance studies and Tier2 protoype development and testing
Status WRT Delivering USATLAS Cycles • We have had dedicated access to 59 nodes in CAC since October 11th but have not yet run many USATLAS production jobs on this cluster (lots of other users though!) • Initial problem was in installing/configuring DDM system. Our site is the first to install and run “production” on a 64 bit SLC4 system. • DQ2 validation (data movement) done by BNL October 15. • Problems in httpd/mod_python (64 bit) lead us to splitting off this part to its own 32bit/SLC3 server…finished & tested
AGLT2 Current Issues • USATLAS Production has not yet been effective are our site • SLC4 x86_64 bit systems were initially part of the problem • Having remote PBS cluster (not locally administrated) also an issue. • Requirements for USATLAS production jobs were not well documented and had to be discovered and fixed. KV works modulo MooEvent compilation (which is not required). • Hopefully we are now functional. Individual components test out (DQ2 server, OSG gatekeeper, PBS cluster)…challenge is overall integration. • Building a robust, highly performing storage architecture • Use of XFS, SATA-II disks on RAID6 controllers, SMART monitoring • Integrate NFSv4+dCache to provide load balancing and performance
Planning for AGL-Tier2 Space • MSU and UM are both working on high-quality server spaces. • Michigan will utilize LS&A’s (Physics college) server room space • Power for cooling and equipment (Flywheels and generators) • 9 racks of space (1 network rack, 8 equipment racks, 8kw/rack) • Lots of fiber access • The UM LS&A space is scheduled to be ready be the end of this month. We plan to occupy it at the end of January 2007. • MSU space will be ready in spring 2007. • We will have ~$140K in equipment money to spend before March 1, 2007 to help provide the CPU and storage listed in our proposal. CAC cycles run out April 9, 2007 (59 dedicated nodes)
Conclusion • The AGL-Tier2 is very close to being able to provide production cycles for US ATLAS. We have to resolve final configuration and usage issues ASAP. • The AGL-Tier2 should truly be a single “site” from US ATLAS’s point of view, even though our equipment and services are distributed between two campuses 60 miles apart. This is because of MiLR (10GE) and a common network address block. • We have options to consider for hardware as we near having our permanent locations ready for occupancy.