710 likes | 729 Views
EGEE Project and Middleware Overview. Marco Verlato. Padova 9 / 5 / 2008. Outline. Introduction The EGEE project Infrastructure Applications Operations and Support The EGEE Middleware: gLite Grid access services Security services Information & Monitoring services
E N D
EGEE Project and Middleware Overview Marco Verlato Padova 9 / 5 / 2008
Outline • Introduction • The EGEE project • Infrastructure • Applications • Operations and Support • The EGEE Middleware: gLite • Grid access services • Security services • Information & Monitoring services • Data Management services • Job Management services • Conclusions
What is a Grid? • Name “Grid” chosen by analogy with electric power grid (Foster and Kesselman 1997) • Vision: plug-in computer for processing power just like plugging in toaster for electricity. • Concept has been around for decades (distributed computing, metacomputing) • Key difference with the Grid is to realise the vision on a global scale.
What is a Grid? • “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” Ian Foster -- Carl Kesselman, 1998 • “A grid is a combination of networked resources and the corresponding middleware, which provides services for the user” Erwin Laure, EGEE T.D., ISSGC2007 • The users of a Grid are divided into Virtual Organisations (VOs), abstract entities grouping users, institutions and resources, e.g.: the 4 LHC experiments, the community of biomedical researchers, etc
What is a Grid? • It relies on advanced software, called middleware • Middleware automatically finds the data the scientist needs, and the computing powerto analyse it • Middleware balances the load on different resources. It also handles security, accounting, monitoringand much more
Enabling Grid for E-sciencE project Flagship Grid infrastructure project co-funded by the European Commission starting from April 2004 Entering now in the 3° phase • Archeology • Astronomy • Astrophysics • Civil Protection • Comp. Chemistry • Earth Sciences • Finance • Fusion • Geophysics • High Energy Physics • Life Sciences • Multimedia • Material Sciences • … >250 sites 48 countries >50,000 CPUs >20 PetaBytes >10,000 users >150 VOs >150,000 jobs/day
Disciplines and users ~8000 users listed in registered VOs Digital libraries, disaster recovery, computational sciences, etc. http://cic.gridops.org/index.php?section=home&page=volist
Types of applications • Simulation • LHC Monte Carlo simulations; Fusion; WISDOM • Jobs needing significant processing power; Large number of independent jobs; limited input data; significant output data • Bulk Processing • HEP ; Processing of satellite data • Distributed input data; Large amount of input and output data; Job management (WMS); Metadata services; complex data structures • Parallel Jobs • Climate models, computational chemistry • Large number of independent but communicating jobs; Need for simultaneous access to large number of CPUs; MPI libraries • Short-response delays • Prototyping new applications; grid Monitoring grid; Interactivity • Limited input & output data; processing needs but fast response and quality of service • Workflow • Medical imaging; flood analysis • Complex analysis algorithms; complex dependencies between jobs • Commercial Applications • Non-open source software; Geocluster (seismic platform); FlexX (molecular docking); Matlab, Mathematics; Idl, … • License server associated to an application deployment model
High Energy Physicsmachines and detectors pp @ √s=14 TeV L : 1034/cm2/s L: 2.1032 /cm2/s Chambres à muons Trajectographe Calorimètre - 2,5 million collisions per second LVL1: 10 KHz, LVL3: 50-100 Hz 25 MB/sec digitized recording 40 million collisions per second LVL1: 1 kHz, LVL3: 100 Hz 0.1 to 1 GB/sec digitized recording
In silico drug discovery • Diseases such as HIV/AIDS, SRAS, Bird Flu etc. are a threat to public health due to world wide exchanges and circulation of persons • Grids open new perspectives to in silico drug discovery • Reduced cost and adding an accelerating factor in the search for new drugs • International collaboration is required for: • Early detection • Epidemiological watch • Prevention • Search for new drugs • Search for vaccines • Avian influenza: • bird casualties
Wide In Silicio Docking On Malaria http://wisdom.healthgrid.org/
Example: Pharmacokinetis • A lesion is detected in an MRI study of a patient – start with virtual biopsy • The process requires obtaining a sequence of MRI volumetric images. • Different images are obtained in different breath-holds. • Before analyzing the variation of each voxel, images must be co-registered to minimize deformation due to different breath holds. • The total computational cost of a clinical trial of 20 patients is around 100 CPU days.
EGEE workload in 2007 Data: 25Pb stored 11Pb transferred CPU: 114 Million hours Estimated cost if performed with Amazon’s EC2 and S3: € 47,486,548 http://gridview.cern.ch/GRIDVIEW/same_index.phphttp://calculator.s3.amazonaws.com/calc5.html?
EGEE-II to EGEE-III • EGEE-III • To be co-funded under European Commission call INFRA-2007-1.2.3 • 32M€ EC funds compared to 36M€ for EGEE-II • Key objectives • Expand/optimise existing EGEE infrastructure, include more resources and user communities • Prepare migration from a project-based model to a sustainable federated infrastructure based on National Grid Initiatives • 2 year period – May 2008 to April 2010 • No gap between EGEE-II and EGEE-III (1 month extension to EGEE-II) • Similar consortium • Now structured on a national basis (National Grid Initiatives/Joint Research Units)
European Grid Initiative (EGI) • Need to prepare permanent, common Grid infrastructure • Ensure the long-term sustainability of the European e-Infrastructure independent of short project funding cycles • Coordinate the integration and interaction between National Grid Infrastructures (NGIs) • Operate the production Grid infrastructure on a European level for a wide range of scientific disciplines Must be no gap in the support of the production grid
EGEE operations Operations Coord. Centre (OCC) - management, oversight of all operational and support activities Regional Operations Centres (ROC) - providing the core of the support infrastructure, each supporting a number of resource centres within its region Resource Centres (RC) - providing resources (computing, storage, network…) -At FZK, coordination and management of user support, single point of contact for users
The Italian Production Grid ~5000 CPUs 950TB (Disk ) + 750 TB (Tape) 40 ‘resource centers’: INFN Grid + SPACI + ENEA + 5 RCs: • Istituto Tecnologie Biomediche – CNR/BARI (LIBI Project) • PERUGIA University • Istituto Linguistica Computazionale CNR-PISA • Scuola Normale Superiore – PISA • ESA-ESRIN Significant expansion foreseen thanks to: Recent PONs TriGrid, PI2S2 Cybersar, Scope, Cresco http://grid-it.cnaf.infn.it
SPACI SouthernPartnershipforAdvancedComputationalInfrastructure 1.5 Tflops ISUFI/CACT Center for Advanced Computing Technologies University of Salento Director: Prof. Giovanni Aloisio IA64 (Itanium 2) DMA/ICAR Dept. of Mathematics and Applications University of Naples “Federico II” & ICAR (Section of Naples) Director: Prof. Almerico Murli MIUR/HPCC Center of Excellence for High Perfomance Computing University of Calabria Director: Prof. Lucio Grandinetti
GEANT CNR Tor Vergata Access to not standard platform AIX – IRIS (afs pool account, lcmaps, yaim customized)
The EGEE support infrastructure • ROC C • ROC B • RC A ROC N VO Support C • RC A VO Support B RC A VO Support A • RC B • RC B RC B • RC C • RC C RC C VO TPM C • ROC C • ROC B ROC N VO TPM B VO TPM A CIC Portal GGUS Central System COD Deployment support Middleware support Deployment support Network Support TPM Middleware support Middleware support Network Support Middleware support Other Grids Other Grids Other Grids Middleware support Middleware support Middleware support Other Grids Other Grids Other Grids
Italian ROC Support • The Italian ROC provides local front line support to Virtual Organization, Users and Resources Centres • The Italian Roc team is organized in daily shifts: • 2 people per shift, 2 shifts per day, from Monday to Friday. • Activities planned during the shift • Log trouble tickets created, updated and closed, problems on grid services and sites, monitor successful site certification • check the status of production grid services and the GRIS status of production CE and SE. • check the status of the production sites using the monitoring tools • Periodic (every 15 days) phone conferences • ROC teams and site managers • Provide and write the ROC report for the weekly EGEE operation meeting
Infrastructures geographical or thematic coverage Support Actions key complementary functions Applications improved services for academia, industry and the public Registered Collaborating Projects 25 projects have registered as of September 2007:web page
e-Infrastructures adopting gLite e-Infrastructures interoperable or in pro- gress to be made interoperable with gLite ~80 countries “linked” together ! e-Infrastructure projects & others Grids
EGEE strategy towards interoperability: The best solution is to have common interfacesthrough the development and adoption of standards. The gLite reference forum for standardization activities is the Open Grid Forum Many contributions (e.g. OGSA-AUTH, BES, JSDL, new GLUE-WG, UR, RUS, SAGA, INFOD, NM, …) Problems: Infrastructures are already in production Standards are still in evolution and often underspecified OGF-GIN follows a pragmatic approach balance between application needs vs. technology push GIN Standards
The GILDA t-Infrastructure (https://gilda.ct.infn.it) • 20 sites in 3 continents • > 11000 certificates issued, >20% renewed at least once • > 250 courses, training events, official university curricula • > 2,000,000 hits on the web site from >100 different countries • > 4.5 TB of training material downloaded from the web site
The INFN Grid Schools(https://agenda.infn.it/conferenceDisplay.py?confId=89)(https://agenda.infn.it/conferenceDisplay.py?confId=85) • Two Grid Schools held in Martina Franca (Taranto, Italy) from 5th to 23rd of November 2007 • 1 week Grid Site Administrator Training Course (to prepare the “Grid-in-a-Room” infrastructure to be used in the following weeks) • 2 weeks Application Integration Training School • 7 applications belonging to different fields such as hadron therapy, data-mining, neural networks, environment and civil protection, hydrology, optimization COMPLETELY “gridified” already during the school • By the end of the school, some applications were also running on INFN production Grid using the resources of several virtual organizations (GRIDIT, THEOPHYS, PAMELA, BIO) • Full report available at: https://agenda.infn.it/materialDisplay.py?materialId=1&confId=85 29
LCG-2 gLite 2004 prototyping prototyping product 2005 product 2006 gLite 3.0 EGEE Middleware Distribution • Combines components from different providers • Condor and Globus (via VDT) • LCG (LHC Computing Grid) • EDG (European Data Grid) • Others • After prototyping phases in 2004 and 2005 convergence with LCG-2 distribution reached in May 2006 • gLite 3.0 released in May 2006, current release is 3.1 • Develop a lightweight stack of generic middleware useful to EGEE applications • Pluggable components – cater for different implementations • Follow SOA approach, WS-I compliant where possible • Focus now is on re-engineering and hardening • Business friendly open source license: Apache 2.0
Condor Globus MyProxy ... EDG . . . VDT LCG gLite in the Grid “ecosystem” 2001 OSG, … DataTAG CrossGrid ... SRM 2004 GridCC NextGrid EGEE DEISA … interactive EU USA Used in Future grids
The middleware structure • Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory • Foundation Grid Middleware will be deployed on the EGEE infrastructure • Must be complete and robust • Should allow interoperation with other major grid infrastructures • Should not assume the use of Higher-Level Grid Services
gLite services orchestration User Interface Workload Management Logging & Bookkeeping Information System submit query discover services retrieve update credential publish state submit publish state query retrieve File and ReplicaCatalogs Site X Computing Element Storage Element AuthorizationService
gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
User Interface (UI) • The access point to the EGEE Grid is the User Interface (UI) • It provides the CLI toolsto access the functionalities offered by the gLite Services • They allowto perform some basic Grid operations: • create the user proxy needed for authentication/authorization • retrieve the status of different resources from the Information System • copy, replicate and delete files from the Grid • list all theresources suitable to execute a given job • submitjobs for execution • cancel jobs • retrievethe output of finished jobs • show the status of submitted jobs • retrieve the logging and bookkeeping information of jobs • It provides the APIsto allow the development ofGrid-enabled applications
GENIUS Grid Portal • Developed by INFN & NICE s.r.l. • GUI mapped to gLite cmd line • Write JDL, Submit JDL, Check status, Download result • GUI for Storage access • TRIANA integration: execute DAG workflows
gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
Structure of a X.509 certificate Public key Subject: C=IT, O=INFN, OU=Personal Certificate,L=LNL,CN=Marco Verlato Issuer: C=IT, O=INFN, CN=INFN CA Validity: Not Before: Mar 15 13:28:54 2008 GMT Not After : Mar 15 13:28:54 2009 GMT Serial Number: 3235 (0xca3) + passphrase CA Digital signature Security: Basic Concepts • GSI Authentication based on PKI X.509 SSL infrastructure • Certificate Authorities (CA) issue (long lived) certificates identifying individuals (much like a passport) • Commonly used in web browsers to authenticate to sites • Trust between CAs and sites is established (offline) • In order to reduce vulnerability, on the Grid user identification is done by using (short lived) proxies of their certificates • Proxies can • Be delegated to a service such that it can act on the user’s behalf • Include additional attributes (like VO information via the VO Membership Service VOMS, see next) • Be stored in an external proxy store (myProxy) • Be renewed (in case they are about to expire)
Bare certificates are not enough for defining user capabilities on the Grid Users belong to VO’s, to groups inside a VO and may have special roles VOMS provides a way to add attributesto a certificate proxy: mutual authentication of client and server VOMS produces a signed Attribute Certificate (AC) the client produces a new proxy that contains the attributes The attributes are used to provide the user with additional capabilities according to the VO policies Authentication Request OK C=IT/O=INFN /L=Padova/CN=Marco Verlato/CN=proxy Query AuthDB VOMSAC VOMSAC VOMS === VO cms extension information === VO: cms subject: /C=IT/O=INFN/OU=Personal Certificate/L=Padova/CN=Marco Verlato/Email=Marco.Verlato@pd.infn.it issuer: /C=CH/O=CERN/OU=GRID/CN=host/voms.cern.ch attribute: /cms/prod/ROLE=manager/Capability=NULL attribute: /cms/Role=NULL/Capability=NULL timeleft : 11:59:45 client VO Membership Service: VOMS
LCAS / LCMAPS • Local Centre Authorization Service (LCAS) • Checks if the user is authorized (currently using the grid-mapfile) • Checks if the user is banned at the site • Checks if at that time the site accepts jobs • Local Credential Mapping Service (LCMAPS) • Maps grid credentials to local credentials (eg. UNIX uid/gid, AFS tokens, etc.) • Map also VOMS group and roles (full support of FQAN) "/VO=cms/GROUP=/cms" .cms "/VO=cms/GROUP=/cms/prod" .cmsprod "/VO=cms/GROUP=/cms/prod/ROLE=manager" .cmsprodman
gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
GRIS and BDIIs BDII top-level Berkeley DatabaseInformation Index Queries WMS WN 2 minutes BDII site-level Site UI FTS - Based on ldap - Standardized information provider (GIP) - GLUE-1.3 schema - Top level Used with 230+ sites - Roughly 60 instances in EGEE BDII resource MDS GRIS provider provider
For users R-GMA appears similar to a single relational database Implementation of OGF’s Grid Monitoring Architecture (GMA) Rich set of APIs (WebBrowsers, Java, C/C++, Python) Typical deployment consists of Producer and Consumer Services on a one per site basis (MON box), and a centralized Registry and Schema Publish Tuples Producer application Producer Service API SQL“INSERT” Register Registry Service Query Tuples SQL“SELECT” Locate Send Query Consumer application Consumer Service API Receive Tuples Schema Service SQL“CREATE TABLE” R-GMA/MON box
GridICE monitoring - MON boxalso hosts the GridICE extended GRIS (on port 2136) - Usually deployed together aSE
gLite services decomposition Access CLI API Information & Monitoring Services Security Services Authorization Information &Monitoring Job Monitoring Auditing Authentication Data Services Job Mgmt. Services JobProvenance PackageManager MetadataCatalog File & ReplicaCatalog Accounting StorageElement DataMovement WorkloadManagement ComputingElement Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf
Data Services • Need common interface to storage resources • Storage Resource Manager (SRM) • Need to keep track where data is stored • File and Replica Catalogs • Need scheduled, reliable file transfer • File transfer services • Need a way to describe files’ content and query them • Metadata catalog • Heterogeneity • Data is stored on different storage systems using different access technologies • Distribution • Data is stored in different locations – in most cases there is no shared file system or common namespace • Data needs to be moved between different locations • Data description • Data are stored as files: need a way to describe files and locate them according to their contents