260 likes | 395 Views
Florida Tech Grid Cluster. P. Ford 2 * X. Fave 1 * M. Hohlmann 1 High Energy Physics Group 1 Department of Physics and Space Sciences 2 Department of Electrical & Computer Engineering. History. Original conception in 2004 with FIT ACITC grant.
E N D
Florida Tech Grid Cluster • P. Ford2 * X. Fave1 * M. Hohlmann1 • High Energy Physics Group • 1Department of Physics and Space Sciences • 2Department of Electrical & Computer Engineering
History • Original conception in 2004 with FIT ACITC grant. • 2007 - Received over 30 more low-end systems from UF. Basic cluster software operational. • 2008 - Purchased high-end servers and designed new cluster. Established Cluster on Open Science Grid. • 2009 - Upgraded and added systems. Registered as CMS Tier 3 site.
Current Status • OS: Rocks V (CentOS 5.0) • Job Manager: Condor 7.2.0 • Grid Middleware: OSG 1.2, Berkeley Storage Manager (BeStMan) 2.2.1.2.i7.p3, Physics Experiment Data Exports (PhEDEx) 3.2.0 • Contributed over 400,000 wall hours to CMS experiment. Over 1.3M wall hours total. • Fully Compliant on OSG Resource Service Validation (RSV), and CMS Site Availability Monitoring (SAM) tests.
System Architecture Compute Element (CE) nas-0-0 Storage Element (SE) compute-1-X compute-2-X
Hardware • CE/Frontend: 8 Intel Xeon E5410, 16GB RAM, RAID5 • NAS0: 4 CPUs, 8GB RAM, 9.6TB RAID6 Array • SE: 8 CPUs, 64GB RAM, 1TB RAID5 • 20 Compute Nodes: 8 CPUs & 16GB RAM each. 160 total batch slots. • Gigabit networking, Cisco Express at core. • 2x 208V 5kVA UPS for nodes, 1x 120V 3kVA UPS for critical systems.
Hardware • Olin Physical • Science High Bay
Rocks OS • Huge software package for clusters (e.g. 411, dev tools, apache, autofs, ganglia) • Allows customization through “Rolls” and appliances. Config stored in MySQL. • Customizable appliances auto-install nodes and post-install scripts.
Storage • Set up XFS on NAS partition - mounted on all machines. • NAS stores all user and grid data, streams over NFS. • Storage Element gateway for Grid storage on NAS array.
Condor Batch Job Manager • Batch job system that enables distribution of workflow jobs to compute nodes. • Distributed computing, NOT parallel. • Users submit jobs to a queue and system finds places to process them. • Great for Grid Computing, most-used in OSG/CMS. • Supports “Universes” - Vanilla, Standard, Grid...
Personal Condor / Central Manager collector negotiator startd schedd Master: Manages all daemons Negotiator: “Matchmaker” between idle jobs and pool nodes. Collector: Directory service for all daemons. Daemons send ClassAd updates periodically. Startd: Runs on each “execute” node. Schedd: Runs on a “submit” host, creates a “shadow” process on the host. Allows manipulation of job queue. Master
Cluster Node Workstation Cluster Node Central Manager Workstation Master Master startd Master startd Master Master Collector schedd schedd negotiator startd startd schedd Typical Condor setup
Condor Priority • User priority managed by complex algorithm (half-life) with configurable parameters. • System does not kick off running jobs. • Resource claim is freed as soon as job is finished. • Enforces fair use AND allows vanilla jobs to finish. Optimized for Grid Computing.
Grid Middleware Source: OSG Twiki documentation
OSG Middleware • OSG middleware installed/updated by Virtual Data Toolkit (VDT). • Site configuration was complex before 1.0 release. Simpler now. • Provides Globus framework & security via Certificate Authority. • Low maintenance: Resource Service Validation (RSV) provides snapshot of site. • Grid User Management System (GUMS) handles mapping of grid certs to local users.
BeStMan Storage • Berkeley Storage Manager: SE runs basic gateway configuration - short config but hard to get working. • Not nearly as difficult as dCache - BeStMan is a good replacement for small to medium sites. • Allows grid users to transfer data to-and-from designated storage via LFN e.g.srm://uscms1-se.fltech-grid3.fit.edu:8443/srm/v2/server?SFN=/bestman/BeStMan/cms...
WLCG • Large Hadron Collider - expected 15PB/year. Compact Muon Solenoid detector will be a large part of this. • World LHC Computing Grid (WLCG) handles the data, interfaces with sites in OSG, EGEE (european), etc. • Tier 0 - CERN, Tier 1 - Fermilab, Closest Tier 2 - UFlorida. • Tier 3 - US! Not officially part of CMS computing group (i.e. no funding), but very important for dataset storage and analysis.
T2/T3 sites in the US T3 T2 T2 T3 T3 T2 T2 T3 T3 T3 T3 T2 T3 T3 T2 T3 T2 T3 T3 https://cmsweb.cern.ch/sitedb/sitelist/
Local Usage Trends • Trends • Over 400,000 cumulative hours for CMS • Over 900,000 cumulative hours by local users • Total of 1.3 million CPU hours utilized
Tier-3 Sites • Not yet completely defined. Consensus: T3 sites give scientists a framework for collaboration (via transfer of datasets), also provide compute resources. • Regular testing by RSV and Site Availability Monitoring (SAM) tests, and OSG site info publishing to CMS. • FIT is one of the largest Tier 3 sites.
PhEDEx • Physics Experiment Data Exports: Final milestone for our site. • Physics datasets can be downloaded from other sites or exported to other sites. • All relevant datasets catalogued on CMS Data Bookkeeping System (DBS) - keeps track of locations of datasets on the grid. • Central web interface allows dataset copy/deletion requests.
Demo • http://myosg.grid.iu.edu • http://uscms1.fltech-grid3.fit.edu • https://cmsweb.cern.ch/dbs_discovery/aSearch?caseSensitive=on&userMode=user&sortOrder=desc&sortName=&grid=0&method=dbsapi&dbsInst=cms_dbs_ph_analysis_02&userInput=find+dataset+where+site+like+*FLTECH*+and+dataset.status+like+VALID*
CMS Remote Analysis Builder (CRAB) • Universal method for experimental data processing • Automates analysis workflow, i.e. status tracking, resubmissions • Datasets can be exported to Data Discovery Page • Locally used extensively in our muon tomography simulations.
Network Performance • Changed to a default 64kB blocksize across NFS • RAID Array change to fix write-caching • Increased kernel memory allocation for TCP • Improvements in both network and grid transfer rates • DD copy tests across network • Changes from 2.24 to2.26 GB/s in reading • Changes from 7.56 to 81.78 MB/s in Writing
DD on the Frontend Before DD on the Frontend After Iperf on the Frontend Before Iperf on the Frontend After