280 likes | 390 Views
The EGEE Grid infrastructure project: first experience and future plans. By Fabrizio Gagliardi EGEE Project Director CERN Geneva Switzerland. Introduction to EGEE - Content. EGEE - what is it and why is it needed? Networking activity – pilot applications
E N D
The EGEE Grid infrastructure project: first experience and future plans By Fabrizio Gagliardi EGEE Project Director CERN Geneva Switzerland
Introduction to EGEE - Content • EGEE - what is it and why is it needed? • Networking activity – pilot applications • Grid operations – providing a stable service • Grid middleware – current and future • Summary The material of this talk has been contributed by several colleagues in the EGEE project Despite its name EGEE is an International project involving in particular Israel, Russia and the US International Workshop on HEP Data Grid – Daegu, August 2004
What is EGEE? • 70 leading institutions in 27 countries, federated in regional Grids • 32 M Euros EU funding (2004-5), O(100 M) total budget • Aiming for a combined capacity of over 20’000 CPUs (one of the largest international Grid infrastructures ever assembled) • ~ 300 dedicated staff International Workshop on HEP Data Grid – Daegu, August 2004
EGEE Activities • Emphasis on operating a production grid and supporting the end-users • 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) • 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) • 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation) International Workshop on HEP Data Grid – Daegu, August 2004
EGEE Applications • EGEE Scope : ALL-Inclusive for academic applications (open to industrial and socio-economic world as well) • The major success criterion of EGEE: how many satisfied users from how many different domains ? • 5000 users (3000 after year 2) from at least 5 disciplines • Two pilot applications selected to guide the implementation and certify the performance and functionality of the evolving infrastructure: Physics & Bioinformatics Application domains and timelines are for illustration only International Workshop on HEP Data Grid – Daegu, August 2004
EGEE pilot application: HEP HEP: • Running large distributed computing systems for many years • Focus for the future is on computing for LHC (LCG ) • The 4 LHC experiments and other current HEP experiments use grid technology e.g. Babar,CDF,D0.., • LHC experiments are currently executing large scale data challenges(DCs) involving thousands of processors world-wide and generating many Terabytes of data • Moving to so-called ‘chaotic’ use of grid with individual user analysis (thousands of users interactively operating within experiment VOs) International Workshop on HEP Data Grid – Daegu, August 2004
LHC experiments • Storage • Raw recording rate 0.1 – 1 GByte/s • Accumulating at 5-8 PetaByte/year • 10 PetaByte of disk • Processing • 200,000 of today’s fastest PCs ATLAS CMS LHCb ALICE International Workshop on HEP Data Grid – Daegu, August 2004
EGEE pilot application: Biomedics Biomedics: • Bioinformatics (gene/proteome databases distributions) • Medical applications (screening, epidemiology, image databases distribution, etc.) • Interactive application (human supervision or simulation) • Security/privacy constraints • Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving • BioMed applications deployed and expect to run first job on LCG-2 by September International Workshop on HEP Data Grid – Daegu, August 2004
BLAST – comparing DNA or protein sequences • BLAST is the first step for analysing new sequences: to compare DNA or protein sequences to other ones stored in personal or public databases. Ideal as a grid application. • Requires resources to store databases and run algorithms • Can compare one or several sequence against a database in parallel • Large user community International Workshop on HEP Data Grid – Daegu, August 2004
EGEE and LCG • EGEE builds on the work of LCG to establish a grid operations service • LCG (LHC Computing Grid) - Building and operating the LHC Grid • A collaboration between: • The physicists and computing specialists from the LHC experiment • The projects in Europe and the US that have been developing Grid middleware • The regional and national computing centres that provide resources for LHC • The research networks International Workshop on HEP Data Grid – Daegu, August 2004
LCG • Mission: • Prepare and deploy the computing environment that will be used by the experiments to analyse the LHC data • Started September 2001 • Strategy: • Integrate thousands of computers at dozens of participating institutes worldwide into a global computing resource • Rely on software being developed in advanced grid technology projects, both in Europe and in the USA (EDG, VDT, others) International Workshop on HEP Data Grid – Daegu, August 2004
EGEE infrastructure • Access to networking services provided by GEANT and the NRENs • Production Service: • in place (based on HEP LCG-2) • for production applications • MUST run reliably, runs only proven stable, debugged middleware and services • Will continue adding new sites in EGEE federations • Pre-production Service: • For middleware re-engineering • Certification and Training/Demo testbeds International Workshop on HEP Data Grid – Daegu, August 2004
LCG-2/EGEE-0 (I) • Based on HEP-LCG testbed: more than 70 sites worldwide International Workshop on HEP Data Grid – Daegu, August 2004
EGEE Operations (I): OMC and CIC • Operation Management Centre • located at CERN, coordinates operations and management • coordinates with other grid projects • Core Infrastructure Centres • behave as single organisations • operate core services (VO specific and general Grid services) • develop new management tools • provide support to the Regional Operations Centres International Workshop on HEP Data Grid – Daegu, August 2004
EGEE Middleware Activity • Middleware selected based on requirements of Applications and Operations • Harden and re-engineer existing middleware functionality, leveraging the experience of partners • Provide robust, supportable components • Support components evolution towards a service oriented approach (Web Services) International Workshop on HEP Data Grid – Daegu, August 2004
EGEE Middleware: gLite • gLite • Exploit experience and existing components from VDT (CondorG, Globus), EDG/LCG, AliEn, and others • Develop a lightweight stack of generic middleware useful to EGEE applications (HEP and Biomedics are pilot applications). • Should eventually deploy dynamically (e.g. as a globus job) • Pluggable components – cater for different implementations • Focus is on re-engineering and hardening • Early prototype and fast feedback turnaround envisaged International Workshop on HEP Data Grid – Daegu, August 2004
LCG-1 LCG-2 EGEE-1 EGEE-2 Globus 2 based Web services based EGEE Implementation • From day 1 (1st April 2004) • Production grid service based on the LCG infrastructure running LCG-2 grid middleware (SA) • LCG-2 will be maintained until the new generation has proven itself (fallback solution) • In parallel develop a “next generation” grid facility • Produce a new set of grid services according to evolving standards (Web Services) • Run a development service providing early access for evaluation purposes • Will replace LCG-2 on production facility in 2005 International Workshop on HEP Data Grid – Daegu, August 2004
Generic Application Support • Getting new scientific and industrial communities interested and committed to use the grid infrastructure built by EGEE is key to the success of the project • Questionnaire to get information and first requirements from new communities interested in using the EGEE Infrastructure (http://alipc1.ct.infn.it/grid/egee/na4/questionnaire/na4-genapp-questionnaire.doc) • Feed-backs received so far (http://alipc1.ct.infn.it/grid/egee/na4/questionnaire): • Astrophysics (EVO and Planck satellite) • Earth Observation (ozone maps, seismology, climate) • Digital Libraries (DILIGENT Project) • Grid Search Engines (GRACE Project) • Industrial applications (SIMDAT Project) • Interest also from Computational Chemistry (Italy and Czech Republic), Civil Engineering (Spain), and Geophysics (Switzerland and France) communities International Workshop on HEP Data Grid – Daegu, August 2004
One exemple • MoU between EGEE and Chonnam National University-Kangnung National University-Sejong University Collaboration (CKSC) • HEP applications: • development of the analysis system for ALICE experiment. • Biomedical applications: • DNA and protein data analysis and Gene Regulation Bioinformatics. International Workshop on HEP Data Grid – Daegu, August 2004
User training and induction • Training material and courses from introductory to advanced level developed at NeSC in UK • Train a wide variety of users both internal to the EGEE consortium and external groups from around the world • 12 courses/presentations already held many more planned in the future • Experience with GENIUS portal and GILDA testbed (provided by INFN) • Major participation to second International Grid school in Italy International Workshop on HEP Data Grid – Daegu, August 2004
Dissemination • 1st project conference • Over 300 delegates came to the 4 day event during April in Cork Ireland • Kick-off meeting bringing together representatives from the 70 partner organisations • Websites, Brochures and press releases • For project and general public www.eu-egee.org • Information packs for the general public, press and industry International Workshop on HEP Data Grid – Daegu, August 2004
Security & Intellectual Property • The existing EGEE grid middleware is distributed under an Open Source License developed by EU DataGrid • No restriction on usage (scientific or commercial) beyond acknowledgement • Same approach for new middleware • Application software maintains its own licensing scheme • Sites must obtain appropriate licenses before installation International Workshop on HEP Data Grid – Daegu, August 2004
EGEE and Industry • Industry as a partner - opportunity to participate in specific activities, thereby increasing know-how on Grid technologies. • Industry as a user - specific industrial sectors will be targeted as potential users of the installed Grid infrastructure, for R&D applications. • Industry as a provider - long-term maintenance of established Grid services, such as call centres, support centres and computing resource provider centres EGEE Industry Forum Raise awareness of the project in industry to encourage industrial participation in the project , foster direct contact of the project partners with industry, ensure that the project can benefit from practical experience of industrial applications International Workshop on HEP Data Grid – Daegu, August 2004
Expected Developments in 2004 • General: • LCG-2/EGEE-0 will be the service run in 2004 – aim to evolve incrementally • Goal is to run a stable service for real production applications • Some functional improvements: • Extend access to MSS – tape systems, and managed disk pools • Distributed vs replicated replica catalogs • Operational improvements: • Monitoring systems – move towards proactive problem finding, ability to take sites on/offline; application monitoring • Continual effort to improve reliability and robustness • Develop accounting and reporting • Address integration issues: • With large clusters, with storage systems • Ensure that large clusters can be accessed via Grid • Issue of integrating with other applications and non-LHC experiments International Workshop on HEP Data Grid – Daegu, August 2004
A look into the Future • We have a window of opportunity to turn Grid from research to production, as networks did a few years ago • If we succeed, we could benefit from the adoption of Grid technology as the main computing infrastructure for science • The next 2 years of EGEE will be critical in establishing the first generation of production Grid • If we succeed then the potential return to international scientific communities will be enormous and possibly followed by similarly important return for commercial and industrial applications International Workshop on HEP Data Grid – Daegu, August 2004
Next major EGEE events • Second EGEE conference in Den Haag, November 22-26, 2004 • First EU Project review on Feb 9-11, 2005 • Close of extraordinary EU Grid call in March 2005 (tbc) • Focus on extension of existing Grid infrastructures (Baltic countries, Latino America, Mediterranean countries, Asia etc.) • Third project conference in early May 2005 (Athens) • Close of 3rd EU Grid call September 2005 (tbc) • Second EU Project review October 2005 (tbc) • Last Project Conference in UK November 2005 (tbc) International Workshop on HEP Data Grid – Daegu, August 2004
Further information • EGEE project – www.eu-egee.org • EU DataGrid – www.eu-edg.org • The HEP LCG project www.cern.ch/lcg • Other Grid projects - www.gridstart.org • The Grid - www.gridcafe.org • Questions to f.gagliardi@cern.ch or project-eu-egee-po@cern.ch International Workshop on HEP Data Grid – Daegu, August 2004
Summary • EGEE is expected to deliver a production Grid infrastructure for scientific applications • The project started 5 months ago • We have a running grid service based on LCG-2 • All EGEE activities are well advanced • Next generation middleware being designed – first prototype made available to applications • EGEE is interested to extend further and in particular in Asia where specific EU funds and initiatives such as TEIN(2) are becoming available • This event is a good opportunity to explore possible new collaborations with international partners • Many thanks for your kind invitation! International Workshop on HEP Data Grid – Daegu, August 2004