350 likes | 362 Views
Learn about the EU-funded EGEE Project, the largest Grid infrastructure initiative in Europe, offering extensive CPU resources, data storage, and user support. Explore federations, activities, services, security measures, and Virtual Organizations in e-science.
E N D
EGEE Enabling Grids for e-science Gabriel Amorós Curso Grid y e-Ciencia 15-18 julio 2008 - Valencia
The EGEE Project Funded by the European Commission, the Enabling Grids for E-sciencE (EGEE) project is the biggest Grid infrastructure project of the EU. The third two-year phase of the project started on 1 May 2008 and includes: A Grid infrastructure spanning about 250 sites across 50 countries An infrastructure of more than 68,000 CPU available to users 24 hours a day, 7 days a week, More than 20 Petabytes (20 million Gigabytes) of storage. Sustained & regular workloads of 30K jobs/day, reaching up to 150K jobs/day Massive data transfers > 1.5 GB/s
Federations Asia Pacific (Australia, Japan, Korea, Taiwan) Benelux (Belgium, theNetherlands) Central Europe (Austria, Croatia, CzechRepublic, Hungary, Poland, Slovakia, Slovenia) France Germany/Switzerland Italy Nordiccountries (Finland, Sweden, Norway) South East Europe (Bulgaria, Cyprus, Greece, Israel, Romania, Serbia, Turkey) South West Europe (Portugal, Spain) Russia UnitedKingdom/Ireland USA
Activities-1 • The Networking Activities are divided into five different areas: • NA1: Management of the Consortium. • NA2: Information Dissemination and Outreach, including tasks such as running the external website, organising conferences and managing the distribution of publications. • NA3: User Training and Induction, including tasks such as organising on-site training and producing training course material. • NA4: User Community Support and Expansion. This includes tasks such as supporting applications and identifying new users. • NA5: Policy and International Cooperation, including tasks such as liaising with parties interested in the EGEE project on an international level.
Activities-2 • SA1: European Grid Support, Operation and Management , including tasks such as grid monitoring and control, resource and user support, and grid security. • SA2: Networking support, including tasks such as policies and service level agreements. • SA3: Integration, testing and certification. The goal of the SA3 activity is to manage the process of building deployable and documented middleware distributions, starting by integrating middleware packages and components from a variety of sources. • JRA1: Middleware Re-engineering and includes tasks such as re-engineering existing middleware, integrating middleware, testing and validation.
Related projects AssessGrid BalticGrid CYCLOPS DEGREE EDGeS Edutain@Grid EUChinaGRID EUMEDGRID EU-IndiaGrid g-Eclipse Health-e-Child ICEAGE InteractiveEuropeanGrid ISSeG KnowARC UK NationalGridService OMII-Europe OMII-UK SEE-GRID-2
EGEE platforms Production Service: This is the largest grid infrastructure provided by EGEE. It runs the latest stable version of the gLite middleware. This is the preferred service for large-scale, production use of the grid. Preproduction Service: This service consists of a limited number of sites running a preview of the next release of the gLite software. This should be used to test existing applications against the new release, and to understand new gLite services. GILDA t-infrastructure: This is a grid which runs the entire gLite software stack in parallel to application porting support. It is used to demonstrate EGEE grid technology project also provides beginner and to support expert training courses.
Support and Security • Security & Policy, including: • Authentication (Use of GSI, X.509 certificates generally issued by national certification authorities) • Agreed network of trust (International Grid Trust Federation (IGTF), EUGridPMA, APGridPMA, TAGPMA) • All EGEE sites will usually trust all IGTF root CA’s • User Support including: • A single access point for support, a portal with well structured information and updated documentation; knowledgeable experts; correct, complete and responsive support; tools to help resolve problems.
WLCG/EGEE infrastructure Site-Federation-Platform… Sites are organised into geographical regions. Production: main production Grid. Normal grid operations. …and VO: The users of a Grid infrastructure are divided into Virtual Organisations (VOs), abstract entities grouping users, institutions and resources in the same administrative domain. Your certificate includes information about it.
Communities-1 • Astrophysics: • The community currently includes 17 institutes, all contributing with applications ported to EGEE. The most relevant among them are Planck, MAGIC, SWIFT/MERCURY, and LOFAR. All of them share problems of computation involving large-scale data acquisition, simulation, data storage, and data retrieval that the grid helps to resolve. Planck and MAGIC have been in EGEE since 2004. The ESA Planck satellite, to be launched in 2008, will map the sky using microwaves, with an unprecedented combination of sky and frequency coverage, accuracy, stability and sensitivity. The MAGIC telescope, on the on the island of La Palma in the Canary Islands, is an imaging atmospheric Cherenkov telescope that has been in operation since late 2004.
Communities-2 • Biomedical • The life sciences are a major application area for the EGEE project and have been used to guide the implementation of the infrastructure from the start. With more than 30 applications deployed and being ported, the domain had more than 200,000 jobs executed per month in 2007. • The medical imaging domain works on a number of related systems, many of them in the compute-intensive area of image co-registration. This enables techniques such as "virtual biopsies" for cancer diagnosis that avoid invasive surgical procedures. • The bioinformatics domain studies genes, proteins, and all components of living organisms. These include enabling system biology on grid, oncology study at the molecular level, genome wide association studies of human complex diseases, binding of protein and DNA in the cell nucleus, complete genome comparison, as well as portals or web services that enable grid access for users in areas such as protein sequence or genome level analysis. • The drug discovery domain uses the EGEE grid infrastructure to accelerate the search for candidate drug molecules against neglected diseases. The WISDOM initiative has been successfully deployed against diseases such as malaria and avian flu, testing many thousands of potential drugs in short periods of time.
Communities-3 • Comp. Chemistry • The Computational Chemistry and Gaussian virtual organizations were established to allow access to chemical software packages on the EGEE infrastructure. At present both freely available (GAMESS, COLUMBUS, DL_POLY, RWAVEP or ABCtraj) and commercial software packages, including Gaussian, Turbomole and Wien2K, are used by chemists to understand better molecular properties, to model chemical reactions or to design new materials. The availability of chemical software is also beneficial for other communities as a source of molecular data parameters for their simulations.
Communities-4 • Earth Science • EGEE supports two related communities in the area of Earth Sciences:Research (ESR) and Geosciences (EGEODE).The applications of the ESR domain cover various disciplines. The most numerous applications are in seismologywith the re-analysis of the whole GEOSCOPE data set, the determination of the earthquake characteristics a few hours after the data arrival and numerical simulations of wave propagation in complex 3D geological models. Several applications are based on atmospheric modellinglike the long-range air pollution transport over Europe, the regional el Nino climate, and the ozone in polar regions. In hydrology, applications include flood forecasting and the calculation of sea water intrusion into coastal aquifers, both related to risk management. Other applications are related to geomorphology, meteorology, or planetology.Geoscluster, an industrial seismic processing solution, is the first industrial application successfully running on the EGEE Grid production infrastructure. Operated by the French company CGGVeritas, Geocluster enables researchers to process seismic data and to explore the composition of the Earth's layers.
Communities-5 • Fusion • Commercial exploitation of fusion energy still needs to solve several outstanding problems, some of which require a strong computing capacity. The International Thermonuclear Experimental Reactor (ITER), a joint international research and development project, aims to demonstrate the scientific and technical feasibility of fusion power and could potentially produce 500 MW of power by 2016. The exploitation of ITER requires a modelling capability that is at the limit of the present state of the art. Therefore, computing grids and high performance computers are basic tools for fusion research. • Presently several applications are already running on the EGEE grid, namely Massive Ray Tracing, Global Kinetic Transport and Stellaratoroptimisation, that have helped to open new avenues of research. A number of new applications devoted to ITER simulation will be ported to the grid in close collaboration with EUFORIA project. Data management in large international experiments and the development of complex workflows are the activities that will complement grid computing.
Communities-6 • High Energy • The High-Energy Physics (HEP) community is one of the pilot application domains in EGEE, and is the largest user of its grid infrastructure. • At present, the major users are the four experiments (ALICE, ATLAS, CMS and LHCb) of the Large Hadron Collider (LHC), which will begin with the first proton-proton collisions in autumn 2008 and achieve the design luminosity in 2010. • These four experiments are using grid resources for large-scale production work involving more than 150,000 jobs/day on the EGEE infrastructure and in collaboration with its sister projects OSG in the USA and NDGF in the Nordic countries. • Other major HEP experiments, such as BaBar, CDF, DO, H1 and ZEUS have also adopted grid technologies and use the EGEE infrastructure for routine physics data processing.
How to participate • Become a user • EGEE welcomes your participation as an end-user, manager of a Virtual Organization (VO), or as a resource provider. • To participate as an end-user, you must have a certificate from an accepted certificate authority (see EUGridPMA) and you must join an existing virtual organization. To obtain a certificate contact the appropriate certificate authority. Search the list of existing virtual organizations on the CIC Portal to find one appropriate for you. Each entry contains contact information. The enrollment process usually takes a couple of days for verifications. For advise on porting an application to the grid, contact the Grid Application Support Centre. • To start a new Virtual Organization (VO), you as the VO manager must fill in the VO Registration Form. You must have a grid certificate to access and submit this form. The approval process takes a minimum of three business days. After that the minimum VO services must be deployed; EGEE can help deploy the core services. For advise on creating a new VO, please contact the VO Support Team. • To provide resources, please see the EGEE Operations Group's information for resource centers. Note that each virtual organization is expected to integrate computational resources into the EGEE infrastructure generally equivalent to its average consumption, although this can be relaxed in exceptional circumstances.
If something fails? • Direct User Suppport • To effectively use the grid, users must have quick responses to questions and high-quality documentation. EGEE provides a team that specializes in responding to the application-related problems through the GGUS ticketing system. This team also catalogs and reviews existing documentation as well as writes new documentation as necessary. • There are local helpdesk support systems for local problems and a Global Grid User Support system (GGUS). • Before creating a ticket think if your problem is local or global. If you are not sure, create the ticket locally.
gLite • EGEE Middleware: gLite • Where the jobs run? • Where are the GRID data stored? • Who manages it? • How are transfered the files? • Where the information is? • How the job requirements are specified? • Is there any log?
gLite • WLCG/EGEE infrastructure: • EGEE: maintaining and developing the gLite middleware and on operating a large computing infrastructure. • Worldwide LHC Computing Grid Project (WLCG): computing infrastructure for the simulation, processing and analysis of the data of the Large Hadron Collider (LHC) experiments. WLCG/EGEE infrastructure uses gLite middleware: • gLite - Lightweight Middleware for Grid Computing http://cern.ch/glite/ • Is the result of several projects: DataGrid, DataTag, Globus, GriPhyN, iVDGL, EGEE and LCG. Other Grid Infrastructures: • Open Science Grid (OSG) uses VDT middleware • NorduGrid uses ARC middleware
WLCG/EGEE-I • Security: • the Grid Security Infrastructure (GSI) enables secure authentication and communication over an open network. GSI is based on public key encryption, X.509 certificates, and the Secure Sockets Layer (SSL) communication protocol, with extensions for single sign-on and delegation. The user certificate is used to generate and sign a temporary certificate, called a proxy certificate (or simply a proxy), which is used for the actual authentication to Grid services and does not need a password. • User Interface: • any machine where users have a personal account and where their user certificate is installed. It provides CLI tools to perform some basic Grid operations: list resources, submit and cancel jobs, show job status and resource status, logging and bookkeeping information, retrieve job output, manage files. Besides CLI (python API) there is a C++ API and a Java API.
WLCG/EGEE-II • Computing Element (CE): • Collection of Worker Nodes (WN). • Grid Gate (GG), interface to the cluster (Ex. LCG CE, gLite CE), • Local Resource Management System (LRMS): OpenPBS/PBSPro, LSF, Maui/Torque, BQS and Condor. • Different queues are considered different CEs: • CEId = <gg_hostname>:<port>/<gg_type>-<LRMS_type>-<batch_queue_name> • Storage Element (SE): • provides uniform access to data storage resources. • supports different data access protocols and interfaces • GSIFTP (a GSI-secure FTP): transfers files. • RFIO: remote file access . • managed by a Storage Resource Manager (SRM). The capabilities depends on the SRM versions. • Disk Pool Manager (DPM): small SEs with disk-based storage only
WLCG/EGEE-III • CASTOR: to manage large-scale MSS, with front-end disks and back-end tape storage. • dCache is targeted at both MSS and large-scale disk array storage systems. • Classic SEs: do not have an SRM interface, provide a simple disk-based storage model
WLCG/EGEE-IV • Informaton Service (IS): • provides information about the WLCG/EGEE Grid resources and their status. • via the IS the resources are discovered. • used for monitoring and accounting purposes. • GLUE Schema: common conceptual data model. • Two IS: • Globus Monitoring and Discovery Service (MDS):for resource discovery and to publish the resource status. • Relational Grid Monitoring Architecture (R-GMA) :for accounting, monitoring and publication of user-level information.
WLCG/EGEE-V • MDS • LDAP-based • Computing and storage resources at a site run a piece of software called an Information Provider, which generates the relevant information about the resource (both static, like the type of SE, and dynamic, like the used space in an SE) • This information is published via an LDAP server called a Grid Resource Information Server (GRIS), which normally runs on the resource itself. In WLCG/EGEE, the Berkeley Database Information Index (BDII) is used to store and publish data from the local GRISes. • A BDII is also used as the top level of the hierarchy: • read from a specific set of sites, which effectively defines a view of the overall Grid resources • obtain information about the sites in the Grid from the Grid Operations Centre (GOC) database, where site managers can insert the contact address of their BDII as well as other useful information about the site.
WLCG/EGEE-VI • R-GMA • as though it were in a global distributed relational database • support more advanced query operations • much easier to modify the schema • three major components: • The Producers, which provide the information, register themselves with the Registry and describe the type and structure of the information they provide. • The Consumers,which request the information, can query the Registry to find out what type of information is available and locate Producers that provide such information. Once this information is known, the Consumer can contact the Producer directly to obtain the relevant data. • The Registry, which mediates the communication between the Producers and the Consumers. • The Producers and Consumers are processes (servlets) running in a server machine at each site (sometimes known as a MON box). • uses a subset of SQL as a query language. Information as a single virtual database containing a set of virtual tables.
WLCG/EGEE-VII • Data Management (DM): • Files and replicas at different sites • Grid files cannot be modified after creation, only read and deleted. • Grid Unique Identifier (GUI) and Logical File Name (LFN) are location independent. guid:93bd772a-b282-4332-a0c5-c79e99fc2e9c and lfn:<human_readable_string>. • Storage URL (SURL) and Transport URL (TURL): where is and how can be accesed. srm:<SE_hostname>/<path> <protocol>://<SE_hostname>:<port>/<path> • The mappings between LFNs, GUIDs and SURLs are kept in a service called a File Catalogue: in WLCG/EGEE is the LCG File Catalogue (LFC)
WLCG/EGEE-VIII • Workload Management System (WMS): • To accept user jobs, to assign them to the most appropriate Computing Element, to record their status and retrieve their output. • WMS services run in the Resource Broker. • the Job Description Language (JDL), specifies features for the job. • The choice of CE to which the job is sent is made in a process called match-making. • Logging and Bookkeeping service (LB) tracks jobs managed by the WMS. It collects events from many WMS components and records the status and history of the job.
Summary Where the jobs run? CE Where are the GRID data stored? SE Who manages it? WMS How are transfered the files? DM Where the information is? IS How the job requirements are specified? JDL Is there any log? LB
References: • www.eu-egee.org • GLITE 3.1 USER GUIDE (April 2008)
Testing the middleware • The SA3 Mission • The goal of the activity is to manage the process of building deployable and documented middleware distributions. • This starts by integrating middleware packages from a variety of sources, extends to the configuration management and certification and testing. The testing ensures that released software is as reliable, robust, scalable and as usable as possible. • To achieve its goal, SA3 operates a large distributed testbed. • SA3 works closely with the TCG on selecting new candidate packages for integration into the software stack. • The connection with the SA1 pre-production, operation and deployment groups is essential to ensure that the distribution and configuration tools meet their requirements. • SA3 coordinates the work towards interoperation with other grid infrastructures based on different middleware. • SA3 partners work actively on increasing the number of platforms on which EGEE's middleware is usable. • In close collaboration with SA1 and NA4, some members of SA3, who have more of a developer's profile, work on troubleshooting complex problems and providing "glue" components.