560 likes | 748 Views
EGEE A Large-scale Production Grid Infrastructure. Erwin Laure EGEE Technical Director. ISSGC06 July 16-28, 2006 Ischia, Italy. Lost in Definitions?. Defining the “Grid”: Access to (high performance) computing power Distributed parallel computing
E N D
EGEEA Large-scale Production Grid Infrastructure Erwin Laure EGEE Technical Director ISSGC06 July 16-28, 2006 Ischia, Italy
Lost in Definitions? Defining the “Grid”: • Access to (high performance) computing power • Distributed parallel computing • Improved resource utilization through resource sharing • Increased storage provision • Controlled access to distributed storage • Interconnection of arbitrary resources (sensors, instruments, …) • Collaboration between users/resources • Higher abstraction layer above network services • Corresponding security • … EGEE - A Large-scale Production Grid Infrastructure
Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. • This interconnection of users, resources, and services for jointly addressing dedicated tasks is called a virtual organization. • Comparison between Grids and Networks: • Networks realize message exchange between endpoints • Grids realize services for the users higher level of abstraction EGEE - A Large-scale Production Grid Infrastructure
Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure
The EGEE Project • Aim of EGEE: “to establish a seamless European Grid infrastructure for the support of the European Research Area (ERA)” • EGEE • 1 April 2004 – 31 March 2006 • 71 partners in 27 countries, federated in regional Grids • EGEE-II • 1 April 2006 – 31 March 2008 • Expanded consortium • 91 partners EGEE - A Large-scale Production Grid Infrastructure
Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure
EGEE Infrastructure Scale (June 2006): ~ 200 sites in 40 countries ~ 25 000 CPUs > 10 PB storage > 35 000 jobs per day > 60 Virtual Organizations Country participating in EGEE EGEE - A Large-scale Production Grid Infrastructure
EGEE Infrastructures • Production service • Scaling up the infrastructure with resource centres around the globe • Stable, well-supported infrastructure, running only well-tested and reliable middleware • Pre-production service • Run in parallel with the production service (restricted nr of sites) • First deployment of new versions of the gLite middleware • Test-bed for applications and other external functionality • T-Infrastructure (Training&Education) • Complete suite of Grid elements and application (Testbed, CA, VO, monitoring, support, …) • Everyone can register and use GILDA for training and testing 20 sites on 3 continents EGEE - A Large-scale Production Grid Infrastructure
EGEE Operations Process • Geographically distributed responsibility for operations: • There is no “central” operation • Regional Operation Centers • Responsible or resource centers in their region • Tools are developed/hosted at different sites: • GOC DB (RAL), SFT (CERN), GStat (Taipei), CIC Portal (Lyon) • Grid operator on duty • 6 teams working in weekly rotation • CERN, IN2P3, INFN, UK/I, Ru,Taipei • Crucial in improving site stability and management • Expanding to all ROCs in EGEE-II • Operations coordination • Weekly operations meetings • Regular ROC managers meetings • Series of EGEE Operations Workshops • Nov 04, May 05, Sep 05, June 06 • Procedures described in Operations Manual • Introducing new sites • Site downtime scheduling • Suspending a site • Escalation procedures; etc. Highlights: • Distributed operation • Evolving and maturing procedures • Procedures being in introduced into and shared with the related infrastructure projects EGEE - A Large-scale Production Grid Infrastructure
Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure
Production Grid Middleware Key factors in EGEE Grid Middleware Development: • Strict software process Use industry standard software engineering methods • Software configuration management, version control, defect tracking, automatic build system, … • Conservative approach in what software to use Avoid “cutting-edge” software • Deployment on over 100 sites cannot assume a homogenous environment – middleware needs to work with many underlying software flavors Avoid evolving standards • Evolving standards change quickly (and sometime significantly cf. OGSI vs. WSRF) – impossible to keep pace on > 100 sites Long (and tedious) pathfrom prototypes to production EGEE - A Large-scale Production Grid Infrastructure
LCG-2 gLite 2004 prototyping prototyping product 2005 product 2006 EGEE Middleware: gLite • Exploit experience & existing components • VDT (Condor, Globus) • EDG/LCG • AliEn • … • Develop a lightweight stack of EGEE generic middleware • Dynamic deployment • Pluggable components • Focus is on re-engineering and hardening • March 4, 2006: gLite 3.0 gLite 3.0 EGEE - A Large-scale Production Grid Infrastructure
Developing • gLite 3.0 now available on production infrastructure • After gLite 3.0: • Continuous release of single components • As needed by users and as made available by developers • Major releases provide a “check-point” • In general in coincidence with major application challenges • Continuing development to • Bring components not yet included in release to maturity • Improve functionality • Increase robustness • Increase usability • Improve the compliance to international standards EGEE - A Large-scale Production Grid Infrastructure
GIN Grid Interoperability Leading role in building world-wide grids • Incubator for new Gridprojects world-wide • Interoperation efforts • Bilateral: EGEE/OSG, EGEE/NDGF, EGEE/NAREGI • Multilateral: Grid Interoperability Now (GIN) • Experiences and requirements fed back into standardization process (GGF – now OGF) • Strengthening contacts with industry EGEE - A Large-scale Production Grid Infrastructure
Applications Environmental Sciences Life & Pharmaceutical Sciences Geo Sciences Middleware APST Globus GT4 Condor Building Software for the Grid Courtesy IBM Platform Infrastructure Unix Windows JVM TCP/IP MPI .Net Runtime VPN SSH Slide Courtesy David Abramson EGEE - A Large-scale Production Grid Infrastructure
Applications Environmental Sciences Life & Pharmaceutical Sciences Geo Sciences Lower Middleware Middleware APST Globus GT4 Condor Bonds Building Software for the Grid Upper Middleware & Tools Courtesy IBM, Platform Infrastructure Unix Windows JVM TCP/IP MPI .Net Runtime VPN SSH Slide Courtesy David Abramson EGEE - A Large-scale Production Grid Infrastructure
Middleware structure • Higher-Level Grid Services may or may not be used by the applications • should help them but not be mandatory • Foundation Grid Middleware is deployed on the infrastructure • should not assume the use of Higher-Level Grid Services • must be complete and robust • should allow interoperation with other major grid infrastructures EGEE - A Large-scale Production Grid Infrastructure
gLite Grid Middleware Services Overview paper http://doc.cern.ch//archive/electronic/egee/tr/egee-tr-2006-001.pdf EGEE - A Large-scale Production Grid Infrastructure
File and ReplicaCatalogs User Interface Resource Broker Computing Element Storage Element Site X Job submission Information System submit query discover services retrieve update credential publish state publish state submit query retrieve AuthorizationService EGEE - A Large-scale Production Grid Infrastructure
gLite Software Process JRA1 Development Directives Error Fixing Software Serious problem SA3 Integration SA3 Testing & Certification SA1 Pre-Production Deployment Packages Testbed Deployment Problem Fail SA1 Production Infrastructure Pre-Production Deployment Fail Integration Tests Pass Functional Tests Pass Fail Installation Guide, Release Notes, etc Scalability Tests Release Pass EGEE - A Large-scale Production Grid Infrastructure
Defining the Grid • A Grid is the combination of networked resources and the corresponding middleware, which provides services for the user. EGEE - A Large-scale Production Grid Infrastructure
EGEE Applications • >20 applications • Astronomy • Biomedicine • Computational Chemistry • Earth Sciences • Financial Simulation • Fusion • Geo-Physics • High Energy Physics • Further applications in evaluation Applications now moving from testing to routine and daily usage EGEE - A Large-scale Production Grid Infrastructure
Mont Blanc (4810 m) Downtown Geneva High Energy Physics Large Hadron Collider (LHC): • One of the most powerful instruments ever built to investigate matter • 4 Experiments: ALICE, ATLAS, CMS, LHCb • 27 km circumference tunnel • Due to start up in 2007 EGEE - A Large-scale Production Grid Infrastructure
Accelerating and colliding particles EGEE - A Large-scale Production Grid Infrastructure
The LHC Accelerator The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors EGEE - A Large-scale Production Grid Infrastructure
Which are recorded on disk and magnetic tapeat 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments LHC DATA This is reduced by online computers that filter out a few hundred “good” events per sec. EGEE - A Large-scale Production Grid Infrastructure
simulation Data Handling and Computation for Physics Analysis reconstruction event filter (selection & reconstruction) detector analysis processed data event summary data raw data batch physics analysis event reprocessing analysis objects (extracted by physics topic) event simulation interactive physics analysis les.robertson@cern.ch EGEE - A Large-scale Production Grid Infrastructure
LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid EGEE - A Large-scale Production Grid Infrastructure
Example: HEP LHCb • LHC data and service challenges • Preparing for LHC start-up in 2007 • Ensure key services & infrastructure are in place • Emphasis on providing a service • Computing needs of experiments • E.g. LHCb: ~700 CPU years in 2005 on the EGEE infrastructure • E.g. ATLAS: over 10,000 jobs per day Massive data transfers > 1.5 GB/s ATLAS ATLAS EGEE - A Large-scale Production Grid Infrastructure
Example: Addressing emerging diseases • Emerging diseases know no frontiers. Time is a critical factor International collaboration is required for: • Early detection • Epidemiological watch • Prevention • Search for new drugs • Search for vaccines Avian influenza: human casualties EGEE - A Large-scale Production Grid Infrastructure
WISDOM, the first step • WISDOM focuses on drug discovery for neglected and emerging diseases. • Summer 2005: World-wide In Silico Docking On Malaria • 46 million ligands docked in 6 weeks • ~1 million virtual ligands selected • 1TB of data produced • 1000 computers in 15 countries • Equivalent to 80 CPU years • Spring 2006: drug design against H5N1 neuraminidase involved in virus propagation • impact of selected point mutations on the efficiency of existing drugs • identification of new potential drugs acting on mutated N1 H5 N1 EGEE - A Large-scale Production Grid Infrastructure
Challenges for high throughput virtual docking 300,000 Chemical compounds: ZINC & Chemical combinatorial library Millions of chemical compounds available in laboratories High Throughput Screening 2$/compound, nearly impossible Molecular docking (Autodock) ~100 CPU years, 600 GB data Data challenge on EGEE, Auvergrid, TWGrid ~6 weeks on ~2000 computers In vitro screening of 100 hits Hits sorting and refining Target (PDB) : Neuraminidase (8 structures) EGEE - A Large-scale Production Grid Infrastructure
Example: Pharmacokinetis • A lesion is detected in an MRI study of a patient – start with virtual biopsy • The process requires obtaining a sequence of MRI volumetric images. • Different images are obtained in different breath-holds. • Before analyzing the variation of each voxel, images must be co-registered to minimize deformation due to different breath holds. • The total computational cost of a clinical trial of 20 patients is around 100 CPU days. EGEE - A Large-scale Production Grid Infrastructure
Sumatra, March 28, 2005 Mw=8.5 Peru, June 23, 2001 Mw=8.4 Example: Determining earthquake mechanisms • Seismic software application determines epicentre, magnitude, mechanism • Analysis of Indonesian earthquake (28 March 2005) • Seismic data within 12 hours after the earthquake • Solution found within 30 hours after earthquake occurred • 10 times faster on the Grid than on local computers • Results • Not an aftershock of December 2004 earthquake • Different location (different part of fault line further south) • Different mechanism Rapid analysis of earthquakes important for relief efforts EGEE - A Large-scale Production Grid Infrastructure
Flood forecasting problem • Many kinds of data • Meteorological, hydrological, hydraulic • Generated by simulations or obtained from sensors • Permanent or periodically updated • Publicly available or with restricted access EGEE - A Large-scale Production Grid Infrastructure
ITU-BR system for RRC 2006 • ITU-BR developed a system for RRC 2006 • Run compatibility andcomplementary analysis • 84 PCs executing168 parallel tasks • Compatibility analysis < 4h GreatSuccess ! • ITU-BR wanted to be sure and do even better • Provide more CPU power • Reduce risks by providing a supplementary system • Gain experience on how to access large and reliable computing resources ‘on demand’ • EGEE used a subset of its Grid for RRC 2006 • Over 400 PCs • Compatibility analysis < 1h EGEE - A Large-scale Production Grid Infrastructure
The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure
User Information & Support • More than 170 training events and summer schools across many countries • >3000 people trained induction; application developer; advanced; retreats • Material archive online with ~250 presentations • Public and technical websites • Dissemination material constantly evolving to expand information and keep it up to date • 4 conferences organized (~ 460 @ Pisa) • Next conference: September 2006 in Geneva ~600 participants EGEE - A Large-scale Production Grid Infrastructure
Industry and EGEE-II • Industry Task Force • Group of industry partners in the project • Links related industry projects (NESSI, BEinGRID, …) • Works with EGEE’s Technical Coordination Group • Collaboration with CERN openlab project • IT industry partnerships for hardware and software development • EGEE Business Associates (EBA) • Companies sponsoring work on joint-interest subjects • Industry Forum • Led by Industry to improve Grid take-up in Industry • Organises industry events and disseminates grid information • e.g. this Wednesday here at the school EGEE - A Large-scale Production Grid Infrastructure
The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure
Applications Environmental Sciences Life & Pharmaceutical Sciences Geo Sciences ??? Middleware APST Globus GT4 Condor Building Software for the Grid Upper Middleware & Tools Lower Middleware Courtesy IBM, Bonds Platform Infrastructure Unix Windows JVM TCP/IP MPI .Net Runtime VPN SSH Slide Courtesy David Abramson EGEE - A Large-scale Production Grid Infrastructure
Portals on EGEE P-Grade Genius EGEE - A Large-scale Production Grid Infrastructure
Example: Biomedicine • Parallel simulationof blood flowon the Grid • Onlinevisualizationof simulationresults on thedesktop • Interactivesteering ofsimulation • Grid is„invisible“ Cooperation with University Amsterdam EGEE - A Large-scale Production Grid Infrastructure
Example: Flooding Crisis Support • Simulation of floodingon the Grid • Onlinevisualizationof simulationresults in theCAVE • Interactivesteering ofsimulation • Grid is„invisible“ Cooperation with Slowak Academy of Sciences EGEE - A Large-scale Production Grid Infrastructure
Scientific Visualization Use your favourite device to connect to the Grid: Sony PSP – PlayStation Portable EGEE - A Large-scale Production Grid Infrastructure
Not only portals • Portals are a good way to bring computing power to end-users • In most cases domain specific • Application programmers (and portal programmers) need more powerful interfaces • Workflow engines • Higher level programming abstractions (SAGA, DRMAA, …) • Programming environments (gEclipse) • Compilers? • … EGEE - A Large-scale Production Grid Infrastructure
The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure
EU GRID Projects related to EGEE EGEE - A Large-scale Production Grid Infrastructure
GIN Related Infrastructures EGEE - A Large-scale Production Grid Infrastructure
The Future of Grids • Increasing the number of infrastructure users by increasing awareness • Dissemination and outreach • Training and education • Increasing the number of applications by improving application support and middleware functionality • Improved usability through high level grid middleware extensions • Increasing the grid infrastructure • Incubating related projects • Ensuring interoperability between projects • Protecting user investments • Towards a sustainable grid infrastructure EGEE - A Large-scale Production Grid Infrastructure