1 / 31

A Regional Analysis Center at the University of Florida

A Regional Analysis Center at the University of Florida. Jorge L. Rodriguez University of Florida September 27, 2004 jorge@phys.ufl.edu. Outline. Facility’s Function and Motivation Physical Infrastructure Facility Systems Administration Future Plans. US Grid computing projects.

pierce
Download Presentation

A Regional Analysis Center at the University of Florida

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Regional Analysis Center at the University of Florida Jorge L. Rodriguez University of Florida September 27, 2004 jorge@phys.ufl.edu

  2. Outline • Facility’s Function and Motivation • Physical Infrastructure • Facility Systems Administration • Future Plans Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  3. US Grid computing projects Facility’s Function & Motivation Operational support for various organization • Experimental High Energy Physics • CMS (Compact Muon Solenoid @ the LHC) • CDF (Central Detector Facility @ FNAL) • CLEO ( e+e- collider experiment @ Cornell) • Computer Science • GriPhyN • iVDGL • “Friends and Neighbors” Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  4. High Energy Physics Activities • US-CMS Tier2 Center • Monte Carlo (MC) production • Small number of expert and dedicated users • Primary consumer of computing resources • Support US-CMS regional analysis community • Larger number (~40) of not so expert users • Large consumer of disk resources • CLEO & CDF analysis and MC simulation • Local CDF activities in recent past • Expect ramp up of local CLEO-C activities Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  5. Computer Science Research • Grid computing research projects • iVDGL & Grid3 production cluster(s) • Grid3 Production site • Provides resource to 100s of users via grid “logins” • GriPhyN & iVDGL Grid development • Middleware infrastructure development • grid3dev and other testbeds • Middleware and Grid application development • GridCat : http://www.ivdgl.org/gridcat • Sphinx : http://www.griphyn.org/sphinx • CAVES : http://caves.phys.ufl.edu • Need to test on real cluster environments Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  6. The Primary Challenge All of this needs to be supported with minimal staffing, “cheap” hardware and moderate expertise Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  7. Physical Infrastructure

  8. Physical Infrastructure • Server Room • Dedicated floor space from dept. • Our feed into campus backbone FLR upgrade to 10 Gbps by 2005 • Hardware: Currently 75 servers • Servers: mix of Dual PIII & P4 Xeon • LAN: mix of FastE and GigE • Total of 9.0 TB of Storage, 5.4 TB on dCache • 4U dual Xeons fileservers w/dual 3ware RAID controllers… • SunEnterprise with FC and RAID enclosures • More storage and servers on order • New dedicated analysis farm • Additional 4 TB dual Opteron system 10 GigE ready (S2io cards) Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  9. Video Conferencing • Two Polycom equipped conference rooms • Polycom ViewStations H.323 and H.262 • Windows XP PCs • Access Grid • Broadcast and participate in lectures from our large conference room Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  10. Espresso Machine • Cuban Coffee • Lighthearted conversation Visitor’s Offices and Services • Visitor Work Spaces • 6 Windows & Linux Desktops • Expanding visitor workspace Workspaces, printers, LCD projector … Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  11. Facility Systems Administration Design Overview

  12. Facility Design Considerations • Centralization of systems management • A single repository of server, cluster and meta-cluster configuration and description • Significantly simplifies maintenance and upgrades • Allows for easy resource deployment and reassignments • Organization is a very, very good thing! • Support multiple versions of the RedHat dist. • Keep up with HEP experiments expectations • Currently we support RH7.3 and RHEL 3 • The future is Scientific linux (based on RHEL) ? Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  13. Cluster Management Technology We are a pretty much a ROCKS Shop! • ROCKS is a open source Cluster Distribution • Its main function is to deploy an OS on a cluster • ROCKS is layered on top the RedHat dist. • ROCKS is extensible ROCKS provides us with the framework and tools necessary to meet our design requirements simply and efficiently http://www.rocksclusters.org Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  14. ROCKS the 5 Minute Tour ROCKS builds on top RH kickstart technology • A simplistic Kickstart digression • A kickstart is a single ASCII script • Lists a single and/or groupings of RPMS to be installed • Proviced staged installation sections %pre, %main %post … • Anaconda the installer • Parses and processes kickstart commands and installs the system This is in fact what you interact with when you install RedHat on your desktop Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  15. ROCKS the 5 Minute Tour cont. ROCKS enhances kickstart by • Providing machinery to push installations to servers • DHCP: node identity assignment to a specific MAC address • https: protocol used to exchange data (kickstart, images, RPM) • cgi script: generates kickstart on the fly • ROCKS also provides a kickstart management system • kickstart generator parses user defined XML spec. files and combines that with node specific information stored in a MySQL database • Complete system description is packaged in components grouped into logical object modules Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  16. ROCKS the 5 Minute Tour cont. The standard ROCKS graph describes a single cluster with service nodes Note: use of “rolls” Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  17. ROCKS’ and MySQL/XML Physical servers: are the objects ROCKS’ MySQL DataBase XML graphs: appliances are like the classes Global Variables, MAC address, node names, distribution, membership, appliances… uflorida-frontend frontend ufloridaPG gatekeeper-pg ufgrid01 gatekeeper gatekeeper-grid01 ufloridaDGT gaterkeeper-dgt grinux01 compute-pg grinux40 compute compute-grid01 grinuxN compute-dgt grinuxM Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  18. Facility Systems Administration Implementation

  19. ROCKS at UFlorida • Current installation is based on ROCKS 3.2.0 • Basic graph architecture modified to meet our requirements • Single frontend manages multiple clusters and service nodes • Support dual distributions RH7.3 & RHEL 3 • Direct interaction with ROCKS MySQL database • Extensive modification to xml trees • The XML tree is based on an older ROCKS vers. 2.3.2 • Our own “uflorida” graphs, one for each distribution • Many changes & additions to the stnd XMLs Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  20. The uflorida XML graphs uflorida.xml for RHEL 3 servers uflorida.xml for RH7.3 servers Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  21. RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 grid users only and limited access from WAN grinux01 grinuxN+1 grinux41 ufgrid04 ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN gatoraid1 uflorida-frontend nfs-homes gatoraid2 The Servers all users and limited access from WAN No user log in and very limited or no access from the WAN Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  22. RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 grinux01 grinuxN+1 grinux41 ufgrid04 • GriPhyn & Analysis Servers • User login machines • User environment • User Interface machines • Grid User login • Access to grid via condorG ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN • NFS servers • Users home area • Users data • Grid users gatoraid1 uflorida-frontend nfs-homes gatoraid2 The Services • The big kahuna: Our stripped down version of the ROCKS frontend • ROCKS administration node • RPM server • ROCKS DB server • Kickstart generation • DHCP server • Admin nfs server • etc • No users, no logins, no home … • Strict firewall rules • Primary DNS server • Other Services • cvs pserver • webserver Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  23. RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 • gatekeeper / master nodes • Grid access only • GRAM, GSIFTP… • Other services minimum grinux01 grinuxN+1 grinux41 ufgrid04 ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN gatoraid1 uflorida-frontend nfs-homes gatoraid2 Batch System Services Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  24. RHEL 3 GriPhyN archer micanopy alachua RH7.3 WAN ufgrid03 grinux01 dCache with SRM: Virtual file system (pnfs) with cache’ing grinuxN+1 grinux41 ufgrid04 ufgrid02 grinux40 grinuxM grinuxN ufgrid01 ufloridaPG ufloridaDGT ufdcache private LAN gatoraid1 uflorida-frontend dCache pool nodes 40 x 50 GB partitions nfs-homes gatoraid2 dCache Disk Storage Service dCache Administrator “admin door” SRM & dCache webserver RAID fileserver Entire 2 TBs of disk on dCache Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  25. The analysis cluster NIS server and clients analysis environment The uflorida XML graphs RHEL 3 graph The production cluster with dCache pool compute nodes RH 7.3 graph Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  26. UFlorida Website Currently just the default ROCKS website. Ours is accessible only from .ufl.edu domain the ganglia pages Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  27. Status and Plans

  28. Facility’s Status • Participated in all major CMS data challenges • Since Fall of 2001: major US contributor of events Since end of 2002: GRID only MC production • Contributed to CDF analysis • UFlorida Group: 4 faculty & staff and 2-3 students • Recently re-deployed infrastructure • Motivated mostly by security concerns • Added new clusters for additional user communities • Ram up to support full USCMS Tier2 center Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  29. Facility’s Future • Plan to re deploy with ROCKS 3.3.x • Recast our own 2.3.2 based XMLs as real 3.3.x • Provide access to new features • Recast our tweaked and tuned services terms of Rolls • Make use of new ROCKS on the WAN • Improve collaboration with our peers • With US-CMS FNAL (Tier1) • State wide FIU, FSU (Tier3) and campus wide HPC • Other Tier2 centers US CMS & Atlas Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  30. Facility’s Future • New hardware is always comming • New dedicated analysis and files servers on order • 6 new dual Xeon based servers • 8.0 TB of disk • New Opteron systems on order • Participate in SC ‘04 bandwidth challenge CIT to JAX • Connect to new FLR 10GigE network equipment • Official Tier2 center will bring new hardware • Approximately 120 servers • Additional 200 TB of storage Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

  31. Summary We have successfully built, from scratch, a computing facility at the University of Florida • Support HEP experiments (LHC, FNAL …) • Support Computer Science activities • Expect a much more active analysis/user community Infrastructure designed to support an large increase in hardware in support of a larger community of users in anticipation of LHC turn on and beyond Jorge L. Rodriguez: A Regional Analysis Center at the University of Florida

More Related