470 likes | 479 Views
Explore the strategy, architecture, and services of FermiGrid, a cutting-edge Grid infrastructure supported by the U.S. Department of Energy. Learn about its personnel, high availability services, computing clusters, and recent work.
E N D
FermiGrid Keith Chadwick Fermilab Work supported by the U.S. Department of Energy under contract No. DE-AC02-07CH11359.
Outline • Introduction - Personnel, Strategy & Effort. • Current Architecture & Software Stack • Authentication and Authorization • High Availability Services • Storage, Auditing & Accounting • Computing Clusters within FermiGrid • Areas of Recent Work FermiGrid - BoilerGrid
Personnel • Eileen Berman, Fermilab, Batavia, IL 60510 berman@fnal.gov • Philippe Canal, Fermilab, Batavia, IL 60510 pcanal@fnal.gov • Keith Chadwick, Fermilab, Batavia, IL 60510 chadwick@fnal.gov * • David Dykstra, Fermilab, Batavia, IL 60510 dwd@fnal.gov • Ted Hesselroth, Fermilab, Batavia, IL, 60510 tdh@fnal.gov • Gabriele Garzoglio, Fermilab, Batavia, IL 60510 garzogli@fnal.gov • Chris Green, Fermilab, Batavia, IL 60510 greenc@fnal.gov • Tanya Levshina, Fermilab, Batavia, IL 60510 tlevshin@fnal.gov • Don Petravick, Fermilab, Batavia, IL 60510 petravick@fnal.gov • Ruth Pordes, Fermilab, Batavia, IL 60510 ruth@fnal.gov • Valery Sergeev, Fermilab, Batavia, IL 60510 sergeev@fnal.gov * • Igor Sfiligoi, Fermilab, Batavia, IL 60510 sfiligoi@fnal.gov • Neha Sharma Batavia, IL 60510 neha@fnal.gov * • Steven Timm, Fermilab, Batavia, IL 60510 timm@fnal.gov * • D.R. Yocum, Fermilab, Batavia, IL 60510 yocum@fnal.gov * FermiGrid - BoilerGrid
Strategy behind FermiGrid • Strategy: • In order to better serve the entire program of Fermilab, the Computing Division has undertaken the strategy of placing all of its production resources in a Grid "meta-facility" infrastructure called FermiGrid. • This strategy is designed to allow Fermilab: • to insure that the large experiments who currently have dedicated resources to have first priority usage of those resources that are purchased on their behalf. • to allow opportunistic use of these dedicated resources, as well as other shared Farm and Analysis resources, by various Virtual Organizations (VO's) that participate in the Fermilab experimental program and by certain VOs that use the Open Science Grid (OSG). • to optimise use of resources at Fermilab. • to make a coherent way of putting Fermilab on the Open Science Grid. • to save some effort and resources by implementing certain shared services and approaches. • to work together more coherently to move all of our applications and services to run on the Grid. • to better handle a transition from Run II to LHC in a time of shrinking budgets and possibly shrinking resources for Run II worldwide. • to fully support Open Science Grid and the LHC Computing Grid and gain positive benefit from this emerging infrastructure in the US and Europe. FermiGrid - BoilerGrid
What is FermiGrid? • FermiGrid is: • The Fermilab campus Grid and Grid portal. • The site globus gateway. • Accepts jobs from external (to Fermilab) sources and forwards the jobs onto internal clusters. • A set of common services to support the campus Grid and interface to Open Science Grid (OSG) / LHC Computing Grid (LCG): • VOMS, VOMRS, GUMS, SAZ, MyProxy, Squid, Gratia Accounting, etc. • A forum for promoting stakeholder interoperability and resource sharing within Fermilab: • CMS, CDF, D0; • ktev, miniboone, minos, mipp, etc. • The Open Science Grid portal to Fermilab Compute and Storage Services. • FermiGrid Web Site & Additional Documentation: • http://fermigrid.fnal.gov/ FermiGrid - BoilerGrid
Timeline and Effort • The FermiGrid concept started in mid CY2004 and the FermiGrid strategy was formally announced in late CY2004. • The initial hardware was ordered and delivered in early CY2005. • The Initial core services (Globus Gateway, VOMS and GUMS) based on OSG 0.2.1 were commissioned on April 1, 2005. • We missed the ides of March, so we chose April Fools Day… • Our first site gateway which used Condor-G matchmaking and MyProxy was commissioned in the fall of CY2005. • Job forwarding based on work by GridX1 in Canada (http://www.gridx1.ca). • Users were required to store a copy of their delegated grid proxy in our MyProxy repository prior to using the job forwarding gateway. • OSG 0.4 was deployed across FermiGrid in late January-February 2006. • Followed quickly by OSG 0.4.1 in March 2006. • The Site AuthoriZation (SAZ) service was commissioned on October 2, 2006. • Provides site wide whitelist and blacklist capability. • Can make decision based on any of DN, VO, Role, and CA. • Currently operate in a default accept mode (providing that the presented proxy was generated using voms-proxy-init). • The glexec pilot job glide-in service was commissioned on November 1, 2006. • Provides authorization and accounting trail for Condor glide-in jobs. • The latest version of the site job forwarding gateway (jobmanager-cemon) was commissioned in November 2006 • Eliminated the need to utilize MyProxy via “accept limited” option on the gatekeeper. • Based on CEMon and OSG RESS, Condor Matchmaking. • Periodic hold and release functions were added in March 2007. • OSG 0.6.0 was deployed across FermiGrid during March & April 2007. • Average effort over this period = ~3 FTE. • Effort profile on next slide. FermiGrid - BoilerGrid
Effort Profile FermiGrid - BoilerGrid
FermiGrid - Current Architecture Step 1 - user issues voms-proxy-init user receives voms signed credentials Step 4 – Gateway requests GUMS Mapping based on VO & Role Step 2 – user submits their grid job via globus-job-run, globus-job-submit, or condor-g Step 3 – Gateway checks against Site Authorization Service Step 5 - Grid job is forwarded to target cluster VOMS Server Periodic Synchronization GUMS Server Site Wide Gateway clusters send ClassAds via CEMon to the site wide gateway SAZ Server BlueArc Exterior Interior CMS WC1 CMS WC2 CDF OSG1 CDF OSG2 D0 CAB1 D0 CAB2 GP Farm FermiGrid - BoilerGrid
Software Stack • Baseline: • Scientific Linux (SL) 3.0.x, 4.x, 5.0 (just released) • OSG 0.6.0 (VDT 1.6.1, GT 4, WS-Gram, Pre-WS Gram) • Additional Components: • VOMS (VO Management Service) • VOMRS (VO Membership Registration Service) * • GUMS (Grid User Mapping Service) • SAZ (Site AuthoriZation Service) • jobmanager-cemon (job forwarding job manager) * • MyProxy (credential storage) • Squid (web proxy cache) • syslog-ng (auditing) • Gratia (accounting) * • Xen (virtualization) • Linux-HA (high availability) FermiGrid - BoilerGrid
Jobmanager-cemon MatchMaking Service • What is it? • FermiGrid has a matchmaking service deployed on the central gatekeeper (fermigrid1.fnal.gov). This service is used to match the incoming jobs against the various resources available at the point in time that the job was submitted. • How can users make use of the MatchMaking Service? • Users begin by submit jobs to the fermigrid1 central gatekeeper through jobmanager-condor or jobmanager-cemon. • By default, the value of the "requirements" attribute is set such that users job will be matched against clusters which support the users VO (Virtual Organization) and have at least one free slot available at the time when the job is submitted to fermigrid1. • However, users have the ability to add additional conditions to this "requirements” attribute, using the attribute named "gluerequirements" in the condor submit file. • These additional conditions should be specified in terms of Glue Schema attributes. • More information: • http://fermigrid.fnal.gov/matchmaking.html FermiGrid - BoilerGrid
Authorization & Authentication • DOEgrids Certificate Authority: • Long lived (1 year) certificates. • Heavy weight process (from the perspective of the typical user…). • Fermilab Kerberos Certificate Authority: • Service run on Fermilab Kerberos Domain Controllers. • Most Fermilab personnel already have Kerberos accounts for single sign on. • Lighter weight process than DOEgrids (from the users perspective). • Short lived (1-7 day) certificates: • kinit -n -r7d -l26h • kx509 • kxlist -p • voms-proxy-init -noregen -voms fermilab:/fermilab -valid 168:0 • submit grid job • Support cron jobs through kcroninit: • /usr/krb5/bin/kcron <script> • /DC=gov/DC=fnal/O=Fermilab/OU=Robots/CN=cron/CN=Keith Chadwick/UID=chadwick • TAGPMA + IGTF FermiGrid - BoilerGrid
Virtual Organization (VO) Services • VOMRS • VO Membership Registration Service. • Used by individuals in order to apply to join the VO or receive specific roles within the VO. • Workflow manages the approval or rejection of the individuals application. • Pushes configurations into VOMS server. • VOMS • VO Management Service. • Authority for DNs in VO & authorized groups and/or roles (FQAN). • GUMS • Grid User Mapping Service. • Periodically synchronizes information from VOMS servers to local repository (every six hours). • List of VOMS servers is configured by the system administrator. • Maps DN+optional FQAN to site specific UID/GID. • SAZ • Site AuthoriZation Service. • Site whitelist / blacklist. • Can be configured for default accept or default deny. • Site can operate multiple SAZ services. FermiGrid - BoilerGrid
VOs hosted at FermiGrid • VO Focus Area URL • fermilab Fermilab Experimental Program http://www.fnal.gov/faw/experimentsprojects/index.html • accelerator, astro, cdms, hypercp, ktev, miniboone, minos, mipp, nova, numi, patriot, test, theory • des Dark Energy Survey https://www.darkenergysurvey.org/ • dzero D0 experiment http://www-d0.fnal.gov • I2u2 Interactions in Understanding The Universe http://ed.fnal.gov/uueo/i2u2.html • ilc International Linear Collider http://ilc.fnal.gov • lqcd Lattice QCD computations http://lqcd.fnal.gov • nanohub Nanotechnology http://www.nanohub.org • gadu Bioinformatics http://compbio.mcs.anl.gov/gaduvo/gaduvo.cgi • sdss Sloan Digital Sky Survey http://www.sdss.org • osg OSG individual researchers or small groups VO http://www.opensciencegrid.org/ FermiGrid - BoilerGrid
Site AuthoriZaton Service (SAZ) • Registration with Fermilab SAZ service is automatic and transparent and occurs when a grid user first runs a job on FermiGrid and the user's DN, VO, Role and CA “affiliation” is entered into its database. • voms-proxy-init vs. grid-proxy-init: • The current SAZ policy is to default allow access to users who present proxy certificate generated from voms-proxy-init [DN.enabled=Y]. • On a case-by-case basis, users who present a proxy certificate generated from grid-proxy-init can be allowed access [DN.trusted=Y]. • SAZ allows the Fermilab security authorities to impose a site-wide grid access policy based upon one or more of the grid users DN, VO, Role or CA. This can be used to implement a rapid response to actual or suspected computer security incident without the need to: • Remove a user from a VOMS server, and then wait for the change to propagate through to the GUMS server. • Wait for CRL propagation, • And it also allows for the temporary suspension of a DN, VO, Role or CA during the investigation of a suspected computer security incident. FermiGrid - BoilerGrid
SAZ - Animation A D M I N SAZ Job Job DN VO Role Gatekeeper CA FermiGrid - BoilerGrid
SAZ - IP Connections / Day FermiGrid - BoilerGrid
SAZ - Unique DNs, VOs, Roles & CAs / Day FermiGrid - BoilerGrid
GUMS • Within FermiGrid, we use the Grid User Mapping Service (GUMS) V1.1.0 • We are starting to plan our upgrade to GUMS V1.2 • GUMS was developed at BNL to manage the mappings between Grid certificates and Linux UID/GIDs. FermiGrid - BoilerGrid
GUMS Mapping Types • There are three types of mapping supported by GUMS: • many-to-one: • All members of the same VO having the same extended key attributes are mapped to a single local UID. • In OSG the many-to-one mapping is an acceptable usage case. • With many-to-one mappings it is possible to obtain the proxy of another user running on the same batch system. • Currently, the OSG “prevents” this through policies set in the Acceptable Usage Policy (AUP) document that all OSG users must electronically sign when they register with their VO. • one-to-one: • A set of pool accounts is created on the batch system and GUMS maps a user DN with extended key attributes to a single, specific pool account. This mapping will be maintained in the GUMS database so that each consecutive request for mapping from that user using that specific set of extended key attributes will result in being mapped to the same pool account UID at that site – multiple users in the same VO cannot obtain another user’s proxy certificate, nor can they accidentally delete or overwrite another user’s data. • makes it easier to trace a rogue job in the queue to the originating user. • The site GUMS administrator must monitor the usage of the pool accounts, and the system administrators must make sure that there are enough pool accounts available on each worker node (WN) in the batch system. • one-to-self: • As the name suggests, a user is mapped to their own local UID on the batch system. • This mapping mechanism is employed at Brookhaven National Laboratory (BNL). • The mapping scheme that is used at a Site is a decision of the site administrators based on the request of the VO using the site. At Fermilab, the [us]CMS VO has requested that users be mapped one-to-one, while the D-Zero VO is using the many-to-one mapping. FermiGrid - BoilerGrid
GUMS Mappings - CMS and dteam FermiGrid - BoilerGrid
GUMS Scaling • The scalability of the GUMS server has been remarkable. • The GUMS server hardware, like all of the current FermiGrid core systems, is comprised of a dual 3.6GHz Intel XEON with 4GB of PC2100 DDRAM, 2 mirrored 10K rpm SCSI system disks and a single gigabit ethernet connection to the LAN. • GUMS is implemented as a Java Servlet and run in Tomcat. • The server is able to sustain a load of 450,000 mapping requests per day (5.2 mappings per second) with an average system load (1 minute) of about 2-3. • It is anticipated that the system will easily be able to accommodate CMS production when the mapping requests will increase to >20Hz. FermiGrid - BoilerGrid
Certificates & Mappings per Day FermiGrid - BoilerGrid
glexec • Joint development by NIKHEF (David Groep / Gerben Venekamp / Oscar Koeroo) and Fermilab (Igor Sfiligoi / Dan Yocum). • glexec allows a site to implement the same level of authentication, authorization and accounting for glide-in jobs on the WN as for jobs which are presented to the globus gatekeeper of the CE. • Integrated (via “plugins”) with LCAS / LCMAPS infrastructure (for LCG) and GUMS / SAZ infrastructure (for OSG). • glexec is currently deployed on several clusters at Fermilab. • The SAZ IP Connections / Day metric is the easiest way to see it… • Will be included “natively” in Condor 6.9.x. FermiGrid - BoilerGrid
glexec block diagram FermiGrid - BoilerGrid
Gratia Accounting • The Gratia accounting system is designed to be a robust, scalable, and accurate grid accounting service for OSG. • Interoperates with EGEE accounting system. • Utilizes a “store locally” and “forward” model to a central collector (repository). • In order to assure that accounting data is not lost. • Sites can operate a local site Gratia collector which summaries site data and then forwards the summaries to the OSG wide Gratia collector. • The Gratia “probes” interface between the Globus Gatekeeper and the site specific batch system to capture the accounting data. • Condor, PBS, SGE, glexec, etc. FermiGrid - BoilerGrid
FermiGrid-HA Upgrades • Two key technologies: • Linux-HA • Active-Active or Active-Standby depending on the service. • XEN • Virtualization of services. • These technologies will be used for: • FermiGrid-HA Site Globus Gatekeeper • Including Web Services GRAM. • FermiGrid-HA Services • VOMS, GUMS, SAZ, etc. • FermiGrid-HA vobox/edge services • FermiGrid-HA grid development platforms FermiGrid - BoilerGrid
FermiGrid HA Services V O M S G U M S V O M S G U M S SAZ SAZ Active Active Active Active Active Active Heartbeat XEN1 XEN2 XEN3 XEN1 XEN2 XEN3 BlueArc FermiGrid - BoilerGrid
FermiGrid HA Gatekeeper G A T H E R E R G A T H E R E R G A T E K E E P E R N E G O T I A T O R G A T E K E E P E R N E G O T I A T O R I N F O I N F O GL O B U S GL O B U S C O N D O R C O N D O R Active Active Active Standby Active Heartbeat XEN3 XEN3 XEN1 XEN2 XEN1 XEN2 Standby BlueArc FermiGrid - BoilerGrid
Auditing Grid Services • Recently completed initiative within FermiGrid: • Taking advantage of the existing Fermilab Computer Security Team infrastructure. • Syslog-ng is now pushing syslogs and grid service logs to the Fermilab site wide syslog repository. • The stored syslogs are automatically indexed using the commercial product “Splunk” and the results of the indexing can be queried via a web interface. FermiGrid - BoilerGrid
FermiGrid - Storage • BlueArc NFS Shared File Systems: • Purchased 24 Tbytes (raw) = 14 Tbytes (formatted RAID 6) • Currently mounted on (most) FermiGrid clusters. • blue2:/fermigrid-home /grid/home 1 TByte • blue2:fermigrid-login /grid/login 1 TByte • blue2:/fermigrid-app /grid/app 2 TByte • blue2:/fermigrid-data /grid/data 7 TBytes • blue2:/fermigrid-state /grid/state 1 TByte • Public dcache (FNAL_FERMIGRID_SE): • 7 TBytes of storage. • Accessable via SRM/dCache. FermiGrid - BoilerGrid
Unique VOs on FermiGrid by Month FermiGrid - BoilerGrid
CPU Time by Organization on FermiGrid per month FermiGrid - BoilerGrid
CPU Time by Organization on FermiGrid per month (stacked) FermiGrid - BoilerGrid
Computing Clusters • In addition to the core Grid services, Fermilab operates clusters for four major (internal) clients: • General Purpose Grid Cluster: • Managed by Experimental Facilities dept in conjunction with FermiGrid Services. • Used by smaller groups and experiments. • Currently - 628 VM's on the grid, another 480 will be coming soon. • CDF Clusters: • Managed by Experimental Facilities dept in conjunction with FermiGrid Services. • Currently - 2 production grid clusters (520 WN / 2600 VM’s), 2 analysis/production (non-grid) clusters (~600 WN / ~3000 VM’s) and 1 test/integration cluster (8 WN / 40 VM’s). • Future - 3 production grid clusters (~1200 WN / ~6000 VM’s) and 1 test/integration cluster (8 WN / 40 VM’s). • D0 Clusters: • Managed by Experimental Facilities dept in conjunction with FermiGrid Services. • Currently - 2 production grid clusters (860 WN / 3440 VM’s) and 1 farm/analysis cluster (360 WN / 1440 VM’s). • Future - 3 production grid clusters (~1200 WN / ~4800 VM’s). • CMS: • Operated by the CMS Computing Facilities Department. • 1st cluster has more than 1900 VM's on the grid. • A second cluster with 2400 VM’s has just been commissioned. FermiGrid - BoilerGrid
GP Grid Cluster • This is the “General Purpose” Grid Cluster • Used by smaller groups and experiments. • Currently 628 VM's on the grid, 480 more coming soon FermiGrid - BoilerGrid
GP Grid Cluster - Today FGS Advises FGS Support Other Support FEF OS Support production integration fermigrid1 OSG fnpcsrv1 -------- Monitoring dFarm server User login interactive submit condor_schedd I/O node (AFS, Enstore) fnpcsrv2 -------- condor negotiator/collector postgres server fngp-osg -------- OSG gatekeeper condor_schedd CEMON 162 WN 60 WN (just completed burn-in) FermiGrid - BoilerGrid
GP Grid Cluster - Future FGS Advises FGS Support Other Support FEF OS Support production integration fermigrid1 OSG fnpcsrv2 -------- condor negotiator/collector postgres server fnpcsrv1 -------- User login & grid submit fngp-osg -------- OSG gatekeeper condor_schedd CEMON 222 WN 40 new WN FermiGrid - BoilerGrid
CDF Clusters • Present: • 2 production grid clusters (520 WN / 2600 VM’s). • 2 analysis/production (non-grid) clusters (~600 WN / ~3000 VM’s). • 1 test/integration cluster (8 WN / 40 VM’s). • Future: • 3 production grid clusters (~1200 WN / ~6000 VM’s). • 1 test/integration cluster (8 WN / 40 VM’s). FermiGrid - BoilerGrid
CDF Clusters - Today FGS Advises FGS Support CDF Support FEF OS Support production integration OSG fcdfhead2 (glidecaf) fcdfhead5 (namcaf) fermigrid1 head3 fcdfhead1 fgtest1 fcdfosg1 fcdfosg2 CAF (condor) fncdfsrv0 CDF Grid Cluster 1 120 WN (condor) CDF Grid Cluster 2 400 WN (just completed burn-in) (condor) fcdfosgt1 Production Farm (condor) Test Farm 8 WN (condor) FermiGrid - BoilerGrid
CDF Clusters - Future FGS Advises FGS Support CDF Support FEF OS Support production integration OSG fcdfhead2 (glidecaf) fcdfhead5 (namcaf) fermigrid1 fcdfosg1 fcdfosg2 fcdfosg3 fcdfhead1 fgtest1 CDF Grid Cluster 1 240 WN (condor) CDF Grid Cluster 2 400 WN (condor) (old CAF) CDF Grid Cluster 3 ~500 WN (condor) fcdfosgt1 Test Farm 8 WN FermiGrid - BoilerGrid
D0 Clusters • Present: • 2 production grid clusters (860 WN / 3440 VM’s). • 1 farm/analysis cluster (360 WN / 1440 VM’s). • Future: • 3 production grid clusters (~1200 WN / ~4800 VM’s). FermiGrid - BoilerGrid
D0 Clusters - Today D0 (?) Advises FGS Support D0 Support FEF OS Support production integration OSG fermigrid1 ClueD0 d0cabosg2 d0cabosg1 d0cabsrv2 d0cabsrv1 fnd0srv1 fnd0srv2 400 Worker Nodes (pbs batch system) 460 Worker Nodes (~180 to be Decommissioned Fall 2007) (pbs batch system) d0farm ~360 Worker Nodes (fbsng batch system) FermiGrid - BoilerGrid
D0 Clusters - Near Term D0 (?) Advises FGS Support D0 Support FEF OS Support production d0cabosg2 d0cabosg1 integration 400 Worker Nodes (pbs batch system) 460 Worker Nodes (~180 to be Decommissioned Fall 2007) (pbs batch system) OSG fermigrid1 ClueD0 fnd0srv1 fnd0srv2 d0farm ~360 Worker Nodes (fbsng batch system) FermiGrid - BoilerGrid
D0 Clusters - Long Term D0 (?) Advises FGS Support D0 Support FEF OS Support production integration OSG fermigrid1 ClueD0 d0cabosg1 d0cabosg2 d0cabosg3 460 Worker Nodes (~180 to be Decommissioned Fall 2007) (pbs batch system) 400 Worker Nodes (pbs batch system) more Worker Nodes (pbs batch system) FermiGrid - BoilerGrid
User Experience on FermiGrid/OSG • VOs at Fermilab using other OSG resources: • Our experience is not very good. • SDSS VO - 82 OSG sites, 41 claim to “support”, but SDSS can run their application on only 5 sites. • Fermilab VO - 82 OSG sites, 42 claim to “support”, but only 30 sites pass our minimal acceptance tests. • Sites are not very responsive to problem tickets. • VOs on OSG using Fermilab resources: • Fermilab operates as a “universal donor” of opportunistic cycles to OSG VOs. • Gratia accounting data show that ~10% of the Grid systems at Fermilab are being used opportunistically. • But there are VOs which run into problems with our job forwarding site gateway and handling the heterogeneous cluster configurations. • SAZ has also caused problems for nanohub (a version with the fix was deployed this past Monday). • There are also VOs which desire significant amounts of resources beyond what we are able to opportunistically contribute (hey buddy can you spare 7 terabytes of disk? We only need it for the next year…). • VOs from other Grids using OSG/Fermilab resources. • We participate in the GIN efforts. • Most recently - Individuals from PRAGMA have been able to make successful use of Fermilab Grid computing resources. FermiGrid - BoilerGrid
Fermilab VO Acceptance FermiGrid - BoilerGrid
fin • Any Questions? FermiGrid - BoilerGrid