870 likes | 1.18k Views
Grid Computing and Grid Site Infrastructure. David Groep NIKHEF/PDP and UvA. Outline. The challenges: LHC, Remote sensing, medicine, … Distributed Computing and the Grid Grid model and community formation, security Grid software: computing, storage, resource brokers, information system
E N D
Grid Computing and Grid Site Infrastructure David Groep NIKHEF/PDP and UvA
Outline • The challenges: LHC, Remote sensing, medicine, … • Distributed Computing and the Grid • Grid model and community formation, security • Grid software: computing, storage, resource brokers, information system • Grid Clusters & Storage : the Amsterdam e-Science Facility • Computing: installing and operating large clusters • Data management: getting throughput to disk and tape • Monitoring and site functional tests • Does it work? Grid Computing and Grid Site Infrastructure, UvA SNB
The LHC Computing Challenge • Physics @ CERN • LHC particle accellerator • operational in 2007 • 4 experiments • ~ 10 Petabyte per year (=10 000 000 GByte) • 150 countries • > 10000 Users • lifetime ~ 20 years 40 MHz (40 TB/sec) level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded 5 KHz (5 GB/sec) level 3 - PCs 100 Hz (100 MB/sec) data recording & offline analysis http://www.cern.ch/ Grid Computing and Grid Site Infrastructure, UvA SNB
LHC Physics Data Processing Source: ATLAS introduction movie, NIKHEF and CERN, 2005 Grid Computing and Grid Site Infrastructure, UvA SNB
Tiered data distribution model 10 PB data distribution issue • 15 major centres in the world • ~200 institutions • ~10 000 people for 1 experiment: Processing issue • 1 ‘event’ takes ~ 90 s • There are 100 events/s • Need: 9000 CPUs (today) • But also: reprocessing, simulation, &c: 2-5x needed in total Grid Computing and Grid Site Infrastructure, UvA SNB
Assumptions Amsterdam is 13% 24 hours/day No in-efficiencies Year 2008 Total T0-T1 Bandwidth RAW+ESD1+AODm1 10,000 file/day 88 MByte/sec 700 Mbit/sec RAW 1.6 GB/file 0.026 Hz 2246 f/day 41.6 MB/s 3.4 TB/day ESD1 500 MB/file 0.052 Hz 4500 f/day 26 MB/s 2.2 TB/day AODm 500 MB/file 0.04 Hz 3456 f/day 20 MB/s 1.6 TB/day T0-T1 Data Flow to Amsterdam Tape diskbuffer Total to tape RAW 41.6 MB/s 2~3 drives 18 tape/day Tier-0 7.2 TByte/day to every Tier-1 diskstorage Total to disk ESD1+AOD1 46 MB/s 4 TB/day 8000 file/day diskstorage ATLAS data flows (draft). Source: Kors Bos, NIKHEF Grid Computing and Grid Site Infrastructure, UvA SNB
Other similar applications: Earth Obs Source: Wim Som de Cerff, KNMI, De Bilt Grid Computing and Grid Site Infrastructure, UvA SNB
WISDOM Malaria drug discovery Anti-malaria drug discovery • ligand docking onto the malaria virus in silico • 60 000 jobs, taking over 100 CPU years total • using 3 000 CPUs completed less than 2 months • 47 sites • 15 countries • 3000 CPUs • 12 TByte of disk Grid Computing and Grid Site Infrastructure, UvA SNB
Infrastructure to cope with this Source: EGEE SA1, Grid Operations Centre, RAL, UK, dd March 2006 Grid Computing and Grid Site Infrastructure, UvA SNB
The Grid and Community Building The Grid Concept Protocols and standards Authentication and Authorization
Three essential ingredients for Grid ‘Access computing like the electrical power grid’ A grid combines resources that • Are not managed by a single organization • Use a common, open protocol … that is general purpose • Provide additional qualities of service, i.e., are usable as a collective and transparentresource Based on: Ian Foster, GridToday, November 2003 Grid Computing and Grid Site Infrastructure, UvA SNB
Grid AuthN/AuthZ and VOs • Access to shared services • cross-domain authentication, authorization, accounting, billing • common protocols for collective services • Support multi-user collaborations in “Virtual Organisations” • a set of individuals or organisations, • not under single hierarchical control, • temporarily joining forces to solve a particular problem at hand, • bringing to the collaboration a subset of their resources, • sharing those under their own conditions • user’s home organization may or need not know about their activities • Need to enable ‘easy’ single sign-on • a user is typically involved in many different VO’s • Leave the resource owner always in control Grid Computing and Grid Site Infrastructure, UvA SNB
V i r t u a l C o m m u n i t y C P e r s o n E ( R e s e a r c h e r ) P e r s o n B F i l e s e r v e r F 1 ( A d m i n i s t r a t o r ) ( d i s k A ) C o m p u t e S e r v e r C 1 ' P e r s o n A P e r s o n D ( P r i n c i p a l I n v e s t i g a t o r ) ( R e s e a r c h e r ) P e r s o n B P e r s o n E ( S t a f f ) F i l e s e r v e r F 1 P e r s o n D ( F a c u l t y ) ( d i s k s A a n d B ) C o m p u t e S e r v e r C 2 C o m p u t e S e r v e r C 1 ( S t a f f ) P e r s o n A P e r s o n F ( F a c u l t y ) ( F a c u l t y ) P e r s o n C C o m p u t e S e r v e r C 3 ( S t u d e n t ) O r g a n i z a t i o n A O r g a n i z a t i o n B Virtual vs. Organic structure Graphic: GGF OGSA Working Group Grid Computing and Grid Site Infrastructure, UvA SNB
More characteristic VO structure Grid Computing and Grid Site Infrastructure, UvA SNB
Trust in Grid Security ‘Security’: Authentication and Authorization • There is no a priori trust relationship between members or member organisations! • VO lifetime can vary from hours to decades • VO not necessarily persistent (both long- and short-lived) • people and resources are members of many VOs • … but a relationship is required • as a basis for authorising access • for traceability and liability, incident handling, and accounting Grid Computing and Grid Site Infrastructure, UvA SNB
Authentication vs. Authorization • Single Authentication token (“passport”) • issued by a party trusted by all, • recognised by many resource providers, users, and VOs • satisfy traceability and persistency requirement • in itself does not grant any access, but provides a unique binding between an identifier and the subject • Per-VO Authorisations (“visa”) • granted to a person/service via a virtual organisation • based on the ‘passport’ name • embedded in the single-sign-on token (proxy) • acknowledged by the resource owners • providers can obtain lists of authorised users per VO,but can still ban individual users Grid Computing and Grid Site Infrastructure, UvA SNB
charter guidelines acceptance process Federated PKI for authentication • A Federation of many independent CAs • common minimum requirements • trust domain as required by users and relying parties • well-defined and peer-reviewed acceptance process • User has a single identity • from a local CA close by • works across VOs, with single sign-on via ‘proxies’ • certificate itself also usable outside the grid CA 2 CA 1 relying party n CA n CA 3 relying party 1 International Grid Trust Federation and EUGridPMA, see http://www.eugridpma.org/ Grid Computing and Grid Site Infrastructure, UvA SNB
Authentication Request OK C=IT/O=INFN /L=CNAF/CN=Pinco Palla/CN=proxy Query AuthDB VOMSpseudo-cert VOMSpseudo-cert VOMS user Authorization Attributes (VOMS) • VO-managed attributed embedded in the proxy • attributed signed by VO • proxy signed by user • user cert signed by CA Grid Computing and Grid Site Infrastructure, UvA SNB
Grid Services Logical Elements in a Grid Information System
Services in a Grid • Computing Element “front-end service for (set of) computers” • Cluster computing: typically Linux with IP interconnect • Capability computing: typically shared-memory supercomputers • A ‘head node’ batches or forwards requests to the cluster • Storage Element “front-end service for disk or tape” • Disk and tape based • Varying retention time, QoS, uniformity of performance • ‘ought’ to be ACL aware: mapping of grid authorization to POSIX ACLs • File Catalogue … • Information System … • Directory-based for static information • Monitoring and bookkeeping for real-time information • Resource Broker … • Matching user job requirements to offers in the information system • WMS allows disconnected operation of the user interface Grid Computing and Grid Site Infrastructure, UvA SNB
Typical Grid Topology Grid Computing and Grid Site Infrastructure, UvA SNB
Job Description Language This is JDL that the user might send to the Resource Broker Executable = "catfiles.sh"; StdOutput = "catted.out"; StdError = "std.err"; Arguments = "EssentialJobData.txt LogicalJobs.jdl /etc/motd"; InputSandbox = {"/home/davidg/tmp/jobs/LogicalJobs.jdl", "/home/davidg/tmp/jobs/catfiles.sh" }; OutputSandBox = {"catted.out", "std.err"}; InputData = "LF:EssentialJobData.txt"; ReplicaCatalog = "ldap://rls.edg.org/lc=WPSIX,dc=cnrs,dc=fr"; DataAccessProtocol = “gsiftp"; RetryCount = 2; Grid Computing and Grid Site Infrastructure, UvA SNB
How to you see the Grid? Broker matches the user’s request with the site • ‘information supermarket’ matchmaking (using Condor Matchmaking) • uses the information published by the site Grid Information system‘the only information a user ever gets about a site’ • So: should be reliable, consistent and complete • Standard schema (GLUE) to describe sites, queues, storage(complex schema semantics) • Currently presented as an LDAP directory LDAP Browser Jarek Gawor: www.mcs.anl.gov/~gawor/ldap Grid Computing and Grid Site Infrastructure, UvA SNB
Glue Attributes Set by the Site • Site information • SiteSysAdminContact: mailto: grid-admin@example.org • SiteSecurityContact: mailto: security@example.org • Cluster info GlueSubClusterUniqueID=gridgate.cs.tcd.ie HostApplicationSoftwareRunTimeEnvironment: LCG-2_6_0 HostApplicationSoftwareRunTimeEnvironment: VO-atlas-release-10.0.4 HostBenchmarkSI00: 1300 GlueHostNetworkAdapterInboundIP: FALSE GlueHostNetworkAdapterOutboundIP: TRUE GlueHostOperatingSystemName: RHEL GlueHostOperatingSystemRelease: 3.5 GlueHostOperatingSystemVersion: 3 GlueCEStateEstimatedResponseTime: 519 GlueCEStateRunningJobs: 175 GlueCEStateTotalJobs: 248 • Storage: similar info (paths, max number of files, quota, retention, …) Grid Computing and Grid Site Infrastructure, UvA SNB
Information system and brokering issues • Size of information system scales with #sites and #details • already 12 MByte of LDIF • matching a job takes ~15 sec • Scheduling policies are infinitely complex • no static schema can likely express this information • Much information (still) needs to be set-up manually… next slides show situation as of Feb 3, 2006 The info system is the single most important grid service • Current broker tries to make optimal decision… instead of a `reasonable’ one Grid Computing and Grid Site Infrastructure, UvA SNB
Example: GlueServiceAccessControlRule For your viewing pleasure: GlueServiceAccessControlRule 261 distinct values seen for GlueServiceAccessControlRule (one of) least frequently occuring value(s): 1 instance(s) of GlueServiceAccessControlRule: /C=BE/O=BEGRID/OU=VUB/OU=IIHE/CN=Stijn De Weirdt (one of) most frequently occuring value(s): 310 instance(s) of GlueServiceAccessControlRule: dteam (one of) shortest value(s) seen: GlueServiceAccessControlRule: d0 (one of) longest value(s) seen: GlueServiceAccessControlRule: anaconda-ks.cfg configure-firewall install.log install.log.syslog j2sdk-1_4_2_08-linux-i586.rpm lcg-yaim-latest.rpm myproxy-addons myproxy-addons.051021 site-info.def site-info.def.050922 site-info.def.050928 site-info.def.051021 yumit-client-2.0.2-1.noarch.rpm Grid Computing and Grid Site Infrastructure, UvA SNB
Example: GlueSEControlProtocolType For your viewing pleasure: GlueSEControlProtocolType freq value 1 GlueSEControlProtocolType: srm 1 GlueSEControlProtocolType: srm_v1 1 GlueSEControlProtocolType: srmv1 3 GlueSEControlProtocolType: SRM 7 GlueSEControlProtocolType: classic … which means that of ~410 Storage Elements, only 13 publish interaction info. Ough! Grid Computing and Grid Site Infrastructure, UvA SNB
Example: GlueHostOperatingSystemRelease Today's attribute: GlueHostOperatingSystemRelease 1 GlueHostOperatingSystemRelease: 3.02 1 GlueHostOperatingSystemRelease: 3.03 1 GlueHostOperatingSystemRelease: 3.2 1 GlueHostOperatingSystemRelease: 3.5 1 GlueHostOperatingSystemRelease: 303 1 GlueHostOperatingSystemRelease: 304 1 GlueHostOperatingSystemRelease: 3_0_4 1 GlueHostOperatingSystemRelease: SL 1 GlueHostOperatingSystemRelease: Sarge 1 GlueHostOperatingSystemRelease: sl3 2 GlueHostOperatingSystemRelease: 3.0 2 GlueHostOperatingSystemRelease: 305 4 GlueHostOperatingSystemRelease: 3.05 4 GlueHostOperatingSystemRelease: SLC3 5 GlueHostOperatingSystemRelease: 3.04 5 GlueHostOperatingSystemRelease: SL3 18 GlueHostOperatingSystemRelease: 3.0.3 19 GlueHostOperatingSystemRelease: 7.3 24 GlueHostOperatingSystemRelease: 3 37 GlueHostOperatingSystemRelease: 3.0.5 47 GlueHostOperatingSystemRelease: 3.0.4 Grid Computing and Grid Site Infrastructure, UvA SNB
Example: GlueSAPolicyMaxNumFiles 136 separate Glue attributes seen For your viewing pleasure: GlueSAPolicyMaxNumFiles freq value 6 GlueSAPolicyMaxNumFiles: 99999999999999 26 GlueSAPolicyMaxNumFiles: 999999 52 GlueSAPolicyMaxNumFiles: 0 78 GlueSAPolicyMaxNumFiles: 00 1381 GlueSAPolicyMaxNumFiles: 10 136 separate Glue attributes seen For your viewing pleasure: GlueServiceStatusInfo freq value 2 GlueServiceStatusInfo: No Known Problems. 55 GlueServiceStatusInfo: No problems 206 GlueServiceStatusInfo: No Problems ?? Grid Computing and Grid Site Infrastructure, UvA SNB
LCG’s Most Popular Resource Centre Grid Computing and Grid Site Infrastructure, UvA SNB
Example: SiteLatitude Today's attribute: GlueSiteLatitude 1 GlueSiteLatitude: 1.376059 1 GlueSiteLatitude: 33.063924198120645 1 GlueSiteLatitude: 37.0 1 GlueSiteLatitude: 38.739925290125484 1 GlueSiteLatitude: 39.21 … 1 GlueSiteLatitude: 45.4567 1 GlueSiteLatitude: 55.9214118 1 GlueSiteLatitude: 56.44 1 GlueSiteLatitude: 59.56 1 GlueSiteLatitude: 67 1 GlueSiteLatitude: GlueSiteWeb: http://rsgrid3.its.uiowa.edu 2 GlueSiteLatitude: 40.8527 2 GlueSiteLatitude: 48.7 2 GlueSiteLatitude: 49.16 2 GlueSiteLatitude: 50 3 GlueSiteLatitude: 41.7827 3 GlueSiteLatitude: 46.12 8 GlueSiteLatitude: 0.0 Grid Computing and Grid Site Infrastructure, UvA SNB
The Amsterdam e-Science Facilities Building and Running a Grid Resource Centre Compute Clusters Storage and Disk Pools
BIG GRID Approved January 2006! Investment of € 29M in next 4 years For: LCG, LOFAR, Life Sciences, Medical, DANS, Philips Research, …See http://www.biggrid.nl/ Grid Resources in Amsterdam • 2x 1.2 PByte in 2 robots • 36+512 CPUs IA32 • disk caches 10 + 50 TByte • multiple 10 Gbit/s links 240 CPUs IA32 7 TByte disk cache 10 + 1 Gbit link SURFnet 2 Gbit/s to SARA only resources with either GridFTP or Grid job management Grid Computing and Grid Site Infrastructure, UvA SNB
Computing Cluster topology Connectivity System services setup
Grid Site Logical Layout Grid Computing and Grid Site Infrastructure, UvA SNB
Batch Systems and Schedulers • Batch system keeps list of nodes and jobs • Scheduler matches jobs to nodes based on policies Grid Computing and Grid Site Infrastructure, UvA SNB
NDPF Logical Composition Grid Computing and Grid Site Infrastructure, UvA SNB
NDPF Network Topology Grid Computing and Grid Site Infrastructure, UvA SNB
Quattor and ELFms System Installation
Installation Installing and managing a large cluster requires a system that • Scales to O(10 000) nodes, with • a wide variety in configuration (‘service nodes’) • and also many instances of identical systems (‘worker nodes’) • You can validate new configurations before it’s too late • Can rapidly recovery from node failures by commissioning a new box (i.e. in minutes) Popular systems include • Quattor (from ELFms) • xCAT • OSCAR • SystemImager & cfEngine Grid Computing and Grid Site Infrastructure, UvA SNB
Node Configuration Management Node Management Quattor ELFms stands for ‘Extremely Large Fabric management system’ Subsystems: • : configuration, installation and management of nodes • : system / service monitoring • : hardware / state management • ELFms manages and controls most of the nodes in the CERN CC • ~2100 nodes out of ~ 2400 • Multiple functionality and cluster size (batch nodes, disk servers, tape servers, DB, web, …) • Heterogeneous hardware (CPU, memory, HD size,..) • Supported OS: Linux (RH7, RHES2.1, RHES3) and Solaris (9) Developed within the EU DataGrid Project (http://www.edg.org/), development and maintenance now coordinated by CERN/IT Source: German Cancio, CERN IT, see http://www.quattor.org/ Grid Computing and Grid Site Infrastructure, UvA SNB
System Installation • Central configuration drives: • Automated Installation Infrastructure (AII)PXE and RedHat KickStart or Solaris JumpStart • Software Package Management (SPMA)transactional management based on RPMs or PKGs • Node Configuration (NCM): autonomous agentsservice configuration components, (re-) configuration • CDB: the ‘desired’ state of the node • two-tiered configuration language (PAN and LLD XML) • self-validating, complete language (“swapspace=2*physmem”) • template inheritance and composition ( “tbn20 = base system + CE_software + pkg_add(emacs)” ) Grid Computing and Grid Site Infrastructure, UvA SNB
H T T P RDBMS S Q L S O A P pan Cache XML CCM Configuration Database GUI CDB LEAF, LEMON, others CLI Scripts Node Management Agents Node Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
H T T P cluster/name: lxplus pkg_add (lsf5.1) cluster/name: lxbatch master: lxmaster01 pkg_add (lsf5.1) name_srv1: 137.138.16.5 time_srv1: ip-time-1 RDBMS S Q L lxplus001 disk_srv lxplus lxbatch CERN CC lxplus020 lxplus029 S O A P pan eth0/ip: 137.138.4.246 pkg_add (lsf6_beta) eth0/ip: 137.138.4.225 Cache XML CCM Configuration Database GUI CDB CLI Scripts Node Management Agents Node Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
H T T P RDBMS S Q L S O A P pan Cache XML CCM Configuration Database GUI CDB CLI Scripts Node Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
H T T P RDBMS S Q L S O A P pan Cache XML CCM Configuration Database GUI CDB CLI Scripts Node Management Agents Node Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
H T T P RDBMS S Q L S O A P pan Cache XML CCM Configuration Database GUI CDB LEAF, LEMON, others CLI Scripts Node Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
H T T P RDBMS S Q L S O A P pan Cache XML CCM Configuration Database GUI CDB CLI Scripts Node Management Agents Node Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
Software Servers http SW package Manager (SPMA) cache packages nfs RPM, PKG SWRep (RPM, PKG) packages ftp Installed software kernel, system, applications.. System services AFS,LSF,SSH,accounting.. base OS Node Configuration Manager (NCM) CCM Node (re)install Install Manager CDB Managing (cluster) nodes Standard nodes Managed nodes Install server Vendor System installer RH73, RHES, Fedora,… nfs/http dhcp pxe Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB
Node Management Agents • NCM (Node Configuration Manager): framework system, where service specific plug-ins called Components make the necessary system changes to bring the node to its CDB desired state • Regenerate local config files (eg. /etc/sshd/sshd_config), restart/reload services (SysV scripts) • Large number of components available (system and Grid services) • SPMA (Software Package Mgmt Agent) and SWRep: Manage all or a subset of packages on the nodes • Full control on production nodes: full control - on development nodes: non-intrusive, configurable management of system and security updates. • Package manager, not only upgrader (roll-back and transactions) • Portability: generic framework with plug-ins for NCM and SPMA • available for RHL (RH7, RHES3) and Solaris 9 • Scalability to O(10K) nodes • Automated replication for redundant / load balanced CDB/SWRep servers • Use scalable protocols eg. HTTP and replication/proxy/caching technology (slides here) Source: German Cancio, CERN IT Grid Computing and Grid Site Infrastructure, UvA SNB