310 likes | 408 Views
A Regional Analysis Center at the University of Florida. Jorge L. Rodriguez University of Florida September 27, 2004 jorge@phys.ufl.edu. Outline. Facility’s Function and Motivation Physical Infrastructure Facility Systems Administration Future Plans. US Grid computing projects.
E N D
A Regional Analysis Center at the University of Florida Jorge L. Rodriguez University of Florida September 27, 2004 jorge@phys.ufl.edu
Outline • Facility’s Function and Motivation • Physical Infrastructure • Facility Systems Administration • Future Plans Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
US Grid computing projects Facility’s Function & Motivation Operational support for various organization • Experimental High Energy Physics • CMS (Compact Muon Solenoid @ the LHC) • CDF (Central Detector Facility @ FNAL) • CLEO ( e+e- collider experiment @ Cornell) • Computer Science • GriPhyN • iVDGL • “Friends and Neighbors” Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
High Energy Physics Activities • US-CMS Tier2 Center • Monte Carlo (MC) production • Small number of expert and dedicated users • Primary consumer of computing resources • Support US-CMS regional analysis community • Larger number (~40) of not so expert users • Large consumer of disk resources • CLEO & CDF analysis and MC simulation • Local CDF activities in recent past • Expect ramp up of local CLEO-C activities Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Computer Science Research • Grid computing research projects • iVDGL & Grid3 production cluster(s) • Grid3 Production site • Provides resource to 100s of users via grid “logins” • GriPhyN & iVDGL Grid development • Middleware infrastructure development • grid3dev and other testbeds • Middleware and Grid application development • GridCat : http://www.ivdgl.org/gridcat • Sphinx : http://www.griphyn.org/sphinx • CAVES : http://caves.phys.ufl.edu • Need to test on real cluster environments Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
The Primary Challenge All of this needs to be supported with minimal staffing, “cheap” hardware and moderate expertise Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Physical Infrastructure • Server Room • Dedicated floor space from dept. • Our feed into campus backbone FLR upgrade to 10 Gbps by 2005 • Hardware: Currently 75 servers • Servers: mix of Dual PIII & P4 Xeon • LAN: mix of FastE and GigE • Total of 9.0 TB of Storage, 5.4 TB on dCache • 4U dual Xeons fileservers w/dual 3ware RAID controllers… • SunEnterprise with FC and RAID enclosures • More storage and servers on order • New dedicated analysis farm • Additional 4 TB dual Opteron system 10 GigE ready (S2io cards) Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Video Conferencing • Two Polycom equipped conference rooms • Polycom ViewStations H.323 and H.262 • Windows XP PCs • Access Grid • Broadcast and participate in lectures from our large conference room Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Espresso Machine • Cuban Coffee • Lighthearted conversation Visitor’s Offices and Services • Visitor Work Spaces • 6 Windows & Linux Desktops • Expanding visitor workspace Workspaces, printers, LCD projector … Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility Systems Administration Design Overview
Facility Design Considerations • Centralization of systems management • A single repository of server, cluster and meta-cluster configuration and description • Significantly simplifies maintenance and upgrades • Allows for easy resource deployment and reassignments • Organization is a very, very good thing! • Support multiple versions of the RedHat dist. • Keep up with HEP experiments expectations • Currently we support RH7.3 and RHEL 3 • The future is Scientific linux (based on RHEL) ? Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Cluster Management Technology We are a pretty much a ROCKS Shop! • ROCKS is a open source Cluster Distribution • Its main function is to deploy an OS on a cluster • ROCKS is layered on top the RedHat dist. • ROCKS is extensible ROCKS provides us with the framework and tools necessary to meet our design requirements simply and efficiently http://www.rocksclusters.org Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS the 5 Minute Tour ROCKS builds on top RH kickstart technology • A simplistic Kickstart digression • A kickstart is a single ASCII script • Lists a single and/or groupings of RPMS to be installed • Proviced staged installation sections %pre, %main %post … • Anaconda the installer • Parses and processes kickstart commands and installs the system This is in fact what you interact with when you install RedHat on your desktop Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS the 5 Minute Tour cont. ROCKS enhances kickstart by • Providing machinery to push installations to servers • DHCP: node identity assignment to a specific MAC address • https: protocol used to exchange data (kickstart, images, RPM) • cgi script: generates kickstart on the fly • ROCKS also provides a kickstart management system • kickstart generator parses user defined XML spec. files and combines that with node specific information stored in a MySQL database • Complete system description is packaged in components grouped into logical object modules Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS the 5 Minute Tour cont. The standard ROCKS graph describes a single cluster with service nodes Note: use of “rolls” Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
ROCKS’ and MySQL/XML Physical servers: are the objects ROCKS’ MySQL DataBase XML graphs: appliances are like the classes Global Variables, MAC address, node names, distribution, membership, appliances… uflorida-frontend frontend ufloridaPG gatekeeper-pg ufgrid01 gatekeeper gatekeeper-grid01 ufloridaDGT gaterkeeper-dgt grinux01 compute-pg grinux40 compute compute-grid01 grinuxN compute-dgt grinuxM Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility Systems Administration Implementation
ROCKS at UFlorida • Current installation is based on ROCKS 3.2.0 • Basic graph architecture modified to meet our requirements • Single frontend manages multiple clusters and service nodes • Support dual distributions RH7.3 & RHEL 3 • Direct interaction with ROCKS MySQL database • Extensive modification to xml trees • The XML tree is based on an older ROCKS vers. 2.3.2 • Our own “uflorida” graphs, one for each distribution • Many changes & additions to the stnd XMLs Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
The uflorida XML graphs uflorida.xml for RHEL 3 servers uflorida.xml for RH7.3 servers Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 grid users only and limited access from WAN grinux01 grinuxN+1 grinux41 ufgrid04 ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN gatoraid1 uflorida-frontend nfs-homes gatoraid2 The Servers all users and limited access from WAN No user log in and very limited or no access from the WAN Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 grinux01 grinuxN+1 grinux41 ufgrid04 • GriPhyn & Analysis Servers • User login machines • User environment • User Interface machines • Grid User login • Access to grid via condorG ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN • NFS servers • Users home area • Users data • Grid users gatoraid1 uflorida-frontend nfs-homes gatoraid2 The Services • The big kahuna: Our stripped down version of the ROCKS frontend • ROCKS administration node • RPM server • ROCKS DB server • Kickstart generation • DHCP server • Admin nfs server • etc • No users, no logins, no home … • Strict firewall rules • Primary DNS server • Other Services • cvs pserver • webserver Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 • gatekeeper / master nodes • Grid access only • GRAM, GSIFTP… • Other services minimum grinux01 grinuxN+1 grinux41 ufgrid04 ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN gatoraid1 uflorida-frontend nfs-homes gatoraid2 Batch System Services Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 grinux01 dCache with SRM: Virtual file system (pnfs) with cache’ing grinuxN+1 grinux41 ufgrid04 ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN gatoraid1 uflorida-frontend dCache pool nodes 40 x 50 GB partitions nfs-homes gatoraid2 dCache Disk Storage Service dCache Administrator “admin door” SRM & dCache webserver RAID fileserver Entire 2 TBs of disk on dCache Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
The analysis cluster NIS server and clients analysis environment The uflorida XML graphs RHEL 3 graph The production cluster with dCache pool compute nodes RH 7.3 graph Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
UFlorida Website Currently just the default ROCKS website. Ours is accessible only from .ufl.edu domain the ganglia pages Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Status • Participated in all major CMS data challenges • Since Fall of 2001: major US contributor of events Since end of 2002: GRID only MC production • Contributed to CDF analysis • UFlorida Group: 4 faculty & staff and 2-3 students • Recently re-deployed infrastructure • Motivated mostly by security concerns • Added new clusters for additional user communities • Ram up to support full USCMS Tier2 center Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Future • Plan to re deploy with ROCKS 3.3.x • Recast our own 2.3.2 based XMLs as real 3.3.x • Provide access to new features • Recast our tweaked and tuned services terms of Rolls • Make use of new ROCKS on the WAN • Improve collaboration with our peers • With US-CMS FNAL (Tier1) • State wide FIU, FSU (Tier3) and campus wide HPC • Other Tier2 centers US CMS & Atlas Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Facility’s Future • New hardware is always comming • New dedicated analysis and files servers on order • 6 new dual Xeon based servers • 8.0 TB of disk • New Opteron systems on order • Participate in SC ‘04 bandwidth challenge CIT to JAX • Connect to new FLR 10GigE network equipment • Official Tier2 center will bring new hardware • Approximately 120 servers • Additional 200 TB of storage Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida
Summary We have successfully built, from scratch, a computing facility at the University of Florida • Support HEP experiments (LHC, FNAL …) • Support Computer Science activities • Expect a much more active analysis/user community Infrastructure designed to support an large increase in hardware in support of a larger community of users in anticipation of LHC turn on and beyond Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida