230 likes | 315 Views
Features and Future. Frédéric Hemmer - CERN Deputy Head of IT Department. BEGrid seminar Brussels, October 27, 2006. Outline. Overview of EGEE EGEE gLite Middleware Foundation services High Level services examples Software process Short Term plans Software Process & ETICS.
E N D
Features and Future Frédéric Hemmer - CERN Deputy Head of IT Department BEGrid seminar Brussels, October 27, 2006
Outline • Overview of EGEE • EGEE gLite Middleware • Foundation services • High Level services examples • Software process • Short Term plans • Software Process & ETICS BEGrid Seminar, Brussels - October 27, 2006
The EGEE project • EGEE • Started in April 2004 • Now in 2nd phase with 91 partners in 32 countries • Objectives • Large-scale, production-quality grid infrastructure for e-Science • Attracting new resources and users from industry as well asscience • Maintain and further improvegLite Grid middleware BEGrid Seminar, Brussels - October 27, 2006
Applications on EGEE • Many applications from a growing numbers of domains • Astrophysics • MAGIC, Planck • Computational Chemistry • Earth Sciences • Earth Observation, Solid Earth Physics, Hydrology, Climate • Financial Simulation • E-GRID • Fusion • Geophysics • EGEODE • High Energy Physics • 4 LHC experiments (ALICE, ATLAS, CMS, LHCb) • BaBar, CDF, DØ, ZEUS • Life Sciences • Bioinformatics (Drug Discovery, GPS@, Xmipp_MLrefine, etc.) • Medical imaging (GATE, CDSS, gPTM3D, SiMRI 3D, etc.) • Multimedia • Material Sciences • > 165 Virtual Organizations (VO) Applications have moved from testing to routine and daily usage ~80-90% efficiency User Forum Book of abstracts: http://doc.cern.ch/archive/electronic/egee/tr/egee-tr-2006-005.pdf App deployment planhttps://edms.cern.ch/document/722131/2 Presentations, posters and demos at EGEE06: http://www.eu-egee.org/egee06 BEGrid Seminar, Brussels - October 27, 2006
EGEE Grid Sites : Q1 2006 CPU sites EGEE: Steady growth over the lifetime of the project EGEE: > 180 sites, 40 countries > 24,000 processors, ~ 5 PB storage BEGrid Seminar, Brussels - October 27, 2006
EGEE – What do we deliver? • Infrastructure operation • Currently includes ~200 sites across 40 countries • Continuous monitoring of grid services & automated site configuration/management http://gridportal.hep.ph.ic.ac.uk/rtm/launch_frame.html • Middleware • Production quality middleware distributed under business friendly open source licence • User Support - Managed process from first contact through to production usage • Training • Expertise in grid-enabling applications • Online helpdesk • Networking events (User Forum, Conferences etc.) • Interoperability • Expanding geographical reach and interoperability with collaborating e-infrastructures BEGrid Seminar, Brussels - October 27, 2006
Middleware Layers Applications • Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware • Higher-Level Grid Services are helping the users building their computing infrastructure but should not be mandatory • Foundation Grid Middleware is deployed on the EGEE infrastructure • Must be complete and robust • Should allow interoperation with other major grid infrastructures • Should not assume the use of Higher-Level Grid Services Higher-Level Grid Services Workload Management Replica Management Visualization Workflow Grid Economies ... Foundation Grid Middleware Security model and Infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring BEGrid Seminar, Brussels - October 27, 2006
The gLite Middleware Approach • Exploit experience and existing components from VDT (Condor, Globus), EDG/LCG, and others • gLite is a distribution that combines components from many different providers! • Develop, Test, Certify & Distribute a generic middleware stack useful to EGEE (and other) applications • Pluggable components • Follow SOA approach, WS-I compliant where possible • Focus is on re-engineering and hardening • Business friendly open source license • Plan to switch to Apache-2 BEGrid Seminar, Brussels - October 27, 2006
gLite Grid Middleware Services Access CLI API Security Information & Monitoring Authorization Auditing Information &Monitoring Application Monitoring Authentication Data Management Workload Management MetadataCatalog File & ReplicaCatalog JobProvenance PackageManager Accounting StorageElement DataMovement ComputingElement WorkloadManagement Site Proxy Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf BEGrid Seminar, Brussels - October 27, 2006
WN Grid Foundation: Computing Element • The CE accepts batch jobs (and job control requests) through a gatekeeper, performs AAA, passes them to a LRMS, monitors the their execution and return results to the submitter • Three flavours available now: • LCG-CE (GT2 GRAM) • in production now but will be phased-out by the end of the year • gLite-CE (GSI-enabled Condor-C) • already deployed but still needs thorough testing and tuning. • CREAM (WS-I based interface) • Contribution to the OGF-BES group for a standard WS-I based CE interface • BLAH is the interface to the local resource manager (via plug-ins) • CREAM and gLite-CE • Information pass-through: pass parameters to the LRMS to help job scheduling WMS, Clients Information System Grid Computing Element bdII R-GMA CEMon Site glexec + LCAS/ LCMAPS BLAH LRMS BEGrid Seminar, Brussels - October 27, 2006
Site File Name (SFN): identifies a Storage Element and the logical name of the file inside it Physical File Name (PFN): argument of file open Storage Resource Manager (SRM) hides the storage system implementation (disk or active tape) checks the access rights to the storage system and the files translates SFNs to PFNs disk-based: DPM, dCache; tape-based: Castor, dCache File I/O: posix-like access from local nodes or the grid GFAL Grid Foundation: Storage Element BEGrid Seminar, Brussels - October 27, 2006
Grid Client Data Server Name Server Request Daemon Disk System Gridftp Client SRM Client NS Database SRM Daemon DPM Daemon NS Daemon RFIO Daemon RFIO Client DPM Database Disk Pool Manager SRM Server Example: The Disk Pool Manager • Light-weight disk-based Storage Element • Easy to install, configure, manage and to join or remove resources • Integrated security (authentication/authorization) based on VOMS groups and roles • All control and I/O services have security built-in: GSI or Kerberos 5 • SRMv1 and SRMv2.1 interfaces. SRMv2.2 being added now Gridftp Server RFIO Client BEGrid Seminar, Brussels - October 27, 2006
Resource usage by VO, group or single user Resource metering: sensors running on resources to determine usage Pricing policies: associate a cost to resource usage if enabled allowed market-based resource brokering privacy: access to accounting data granted only to authorized people (user, provider, VO manager) Basic functionality in APEL, full functionality in DGAS Grid Foundation: Accounting BEGrid Seminar, Brussels - October 27, 2006
High Level Services: Job Information • Logging and Bookkeeping service • Tracks jobs during their lifetime (in terms of events) • Job Provenance stores long term job information • Supports job rerun BEGrid Seminar, Brussels - October 27, 2006
Resource brokering, workflow management, I/O data management Web Service interface: WMProxy Task Queue: keep non matched jobs Information SuperMarket: optimized cache of information system Match Maker: assigns jobs to resources according to user requirements Job submission & monitoring Condor-G Condor-C ICE (to CREAM) External interactions: Information System Data Catalogs Logging&Bookkeeping Policy Management system (G-PBox) High Level Services: Workload Management BEGrid Seminar, Brussels - October 27, 2006
Reliable and manageable File Transfer System for VOs Transfers are treated as jobs May be split onto multiple “channels” Channels are point-to-point or “catch-all” (only one end fixed). More flexible channel definitions on the way... New features that will be available in production soon: Cleaner error reporting and service monitoring interfaces Proxy renewal and delegation SRMv2.2 support Longer term development: Optimized SRM interaction split preparation from transfer Better service management controls Notification of finished jobs Pre-staging tape support Catalog & VO plug-ins framework Allow catalog registration as part of transfer workflow High Level Services : FTS BEGrid Seminar, Brussels - October 27, 2006
Encrypted Data Storage encrypt and decrypt data on-the-fly Key-store: Hydra N instances: at least M (<N) need to be available for decryption fault tolerance and security Demonstrated with the SRM-DICOM demo at EGEE Pisa conference (Oct’05) High Level Services: EDS Will be DPM (now d-Cache) Will be LFC Will be GFAL BEGrid Seminar, Brussels - October 27, 2006
Main focus for the developers • Give support on the production infrastructure (GGUS, 2nd line support) • Fix defects found on the production software • Support SL(C)4 and 64bit architectures (x86-64 first) • Participate to Task Forces together with applications and site experts and improve scalability • Improve robustness and usability (efficiency, error reporting, ...) • Address requests for functionality improvements from users, site administrators, etc... (through the Technical Coordination Group) • Improve adherence to international standards and interoperability with other infrastructures • Deploy and expose to users new components on the preview test-bed • Interoperability with Shibboleth • Work plans available at: https://twiki.cern.ch/twiki/bin/view/EGEE/EGEEgLiteWorkPlans BEGrid Seminar, Brussels - October 27, 2006
WMS Performance Results • ~20000 jobs submitted • 3 parallel UIs • 33 Computing Elements • 200 jobs/collection • Bulk submission • Performances • ~ 2.5 h to submit all jobs • 0.5 seconds/job • ~ 17 hours to transfer all jobs to a CE • 3 seconds/job • 26000 jobs/day • Job failures • Negligible fraction of failures due to the gLite WMS • Either application errors or site problems By A.Sciabà BEGrid Seminar, Brussels - October 27, 2006
gLite Software Process • Technical Coordination Group (TCG) • Gathers & prioritizes user requirements from HEP, Biomed, (industry), sites • gLite development is client-driven! • Software from EGEE-JRA1 and other projects • JRA1 preview test-bed (currently being set up) • early exposure to users of “uncertified” components • SA3 Integration Team • Ensures components are deployable and work • Deployment Modules implemented high-level gLite node types • (WMS, CE, R-GMA Server, VOMS Server, FTS, etc) • Build system now spun off into the ETICS project • SA3 Certification Team • Dedicated test-bed; test release candidates and patches • Develop test suites • SA1 Pre-Production System • Scale tests by users BEGrid Seminar, Brussels - October 27, 2006
Web Application NMI Scheduler Web Service ETICS Via browser Build/Test Artefacts Report DB Project DB Via command- Line tools NMI Client WNs ETICS Infrastructure Clients BEGrid Seminar, Brussels - October 27, 2006
Summary • EGEE is a global effort, and the largest multi-science Grid infrastructure worldwide • gLite 3.0 is an important milestone in EGEE program • New components from gLite 1.X developed in the first phase of EGEE are being deployed for the first time on the Production Infrastructure • Addressing application ad operations requirements in terms of functionality and scalability • New build and integration environment from ETICS • Controlled software process and certification • Development is application driven (TCG) • Collaboration with other projects for interoperability and definition/adoption of international standards BEGrid Seminar, Brussels - October 27, 2006
www.glite.org www.eu-egee.org BEGrid Seminar, Brussels - October 27, 2006