370 likes | 576 Views
Grid Applications & Grid Services. C. Loomis (LAL-Orsay) EMBRACE-3DEM (Madrid) 23 February 2007. Contents. Introduction EGEE project history Usage and users Grid Application Families Grid Software & Services Summary. Evolution. larger grid. more apps.
E N D
Grid Applications &Grid Services • C. Loomis (LAL-Orsay) • EMBRACE-3DEM (Madrid) • 23 February 2007
Contents • Introduction • EGEE project history • Usage and users • Grid Application Families • Grid Software & Services • Summary
Evolution larger grid more apps. • EGEE: Enabling Grids for E-sciencE • Two-year project funded by European Commission. • Provides computing infrastructure for e-science. • Evolution of Project (2001–now): • European DataGrid: R&D • EGEE: Re-engineering & Infrastructure • EGEE-II: Infrastructure & Re-engineering • EGEE-III: Same focus, in preparation • Evolution of Grid Users: • Focus: Grid technology Scientific results • Goal: Grid technology Grid as a tool • Experience: IT experts IT “minimalists”
EGEE/LCG Production Service > 175 sites > 30 kCPU > 13 PB http://goc03.grid-support.ac.uk/googlemaps/lcg.html
Grid Virtual Organizations • Routine and large-scale use of EGEE infrastructure. • Virtual Organizations: • 200+ visible on the grid • 100+ registered with EGEE • App. Deploy. Plan (https://edms.cern.ch/document/722131/2) http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_vo.php
Usage History Nov. ’06 Dec. ’05 Virtual Organizations • Sharing and federation of resources make sense! http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_vo.php
Scientific Domains • Astrophysics • Planck, MAGIC • Computational Chemistry • Earth Science • Hydrology, Pollution, Climate, Geophysics, … • Fusion • High-Energy Physics • LHC, Tevatron, HERA, … • Life Sciences • Medical Images, Bioinformatics, Drug Discovery • Related Projects • Finance, Digital Libraries, … • And more…
Grid Benefits • Science is a balance between competition and cooperation. Grid appeals to both aspects. • Better use of resources: • Sharing: faster turnaround with lower investment. • Federation: reach previously unattainable scales • Better science: • Faster results: get published first! • Higher quality: better statistics, more varied data. • Collaboration • Platform to bring different people with different skills together. • Mechanism to publish, reuse, and combine previous data.
Job Submission Site 1 Information System Replica Catalogs Computing Element Storage Element Resource Broker Site 2 Computing Element User Interface Storage Element Information System Replica Catalog publish status 2. query 3. query Resource Broker 5. retrieve 4. submit 6. retrieve 1. submit User Interface
Comp. Serv. (Gatekeepers) • LCG-CE (production) • Modified GT2 gatekeeper with VOMS support. • Not ported to SL4/VDT 1.3; supported until gLite CE is certified. • gLite CE (under test) • Only direct interface is Condor-G. • Possible to run pre-WS GRAM too; not certified nor supported. • Maybe possible to run WS GRAM. • CREAM CE (development) • Native, proprietary web-service interface. • Request to provide WS-GRAM interface in addition.
Comp. Serv. (Resource Brokers) • LCG-RB (production) • Phased out in preference to WMS. • gLite WMS (test) • Will talk to old and new CE interfaces. • Provides higher-level services: DAG, parameterized jobs, etc. • Version deployed on production service, but not stable. • Next version extensively tested and is much more robust. • GridWay (http://www.gridway.org/) • Lighter weight, lower latencies than EGEE brokers. • Standard DRMAA interface. • Federation of EGEE, non-EGEE resources.
Comp. Serv. (Others) • Workflow • TAVERNA, MOTEUR have been used. • Need better web-service support for these tools. • Others • GANGA/DIANE (ARDA): job management framework • JJS (CC-IN2P3): java job submission
Storage Services • Strategy: Follow SRM (Storage Resource Manager). • Implementations provide SRMv1+ functionality. • SRMv2+ will provide better access control possibilities. • DPM (CERN) • Disk Pool Manager: only supports disk storage. • DCache (DESY) • Supports tape and other backends. • Very flexible, but complicated to install and configure. • Storage Resource Broker (SRB) • Used by many disciplines for data and metadata management. • Won’t be integrated; probably can use on EGEE infrastructure.
Data Management Services • LCG File Catalog (LFC) • Actually a general file catalog included as part of gLite. • Currently has limited access control features. • File Transfer Service (FTS) • Reliable file transfer service (i.e. batch system for data). • Used only by LHC VOs now; could be used by others. • Hydra • Key server for data encryption. • Client in gLite; server (?). • gLite IO, Fireman (deprecated) • Provide better ACL management and consistency. • Functionality to be incorporated into standard services.
Transparent Data Access • ELFI • Uses FUSE kernel module to expose “grid file system”. • Limited to systems where FUSE is available (easier with SL4). • Needs to allow users to mount the file system. • Parrot • Intercepts system calls to provide grid data access. • Resides completely in user space.
Metadata Services • AMGA • Lightweight metadata catalog developed in ARDA. • Allows distribution and federation of servers. • Clients in gLite; server (?). • OGSA-DAI • Generic, secured interface to databases. • Works but has scalability, performance problems. • Integration not likely in the near future. • GDSE (Grid Data Source Engine) • Developed by INFN. • Generic interface to data sources (DBs included).
Information Systems • Strategy • Keep BDII-based information system for medium-term. • Need something faster and more scalable for longer term. • GLUE schema will evolve with needs of apps. and projects. • Version 2 should be completely (?) service-based. • BDII (production) • LDAP-based information system. • Contains all published information. • Used for service discovery and service status. • R-GMA • Producer-consumer deployment model. • Specialized uses: accounting and some application monitoring.
Security • Security infrastructure is mature; no significant changes in the short to medium-term. • Certificate Authority services • VOMS • LCAS/LCMAPS • Proxy renewal • Significant work to integrate these with all services! • Potential new services: • Hydra: Data encryption key server • G-PBOX: distribution of VO-specific policies
Accounting • Two competing/cooperating systems for collecting and presenting accounting information. • APEL • Works only for computing-related usage. • Has (partial) usage information since early 2005. • Uses R-GMA for collecting the accounting information. • DGAS • General framework for collecting and metering usage. • Probably included in next release of gLite. • Developers have agreed to use same accounting sensors for collecting information.
Important Core Changes • Move from SL3 to SL4 • Change from 2.4 to 2.6-series kernel. • Provides better support for new hardware. • Better performance on multi-CPU systems. • Minor version change of GCC compiler. • VDT (Virtual Data Toolkit) • Change from VDT 1.2 to 1.3 • Compatibility with latest Globus Toolkit™. • Should have web service interfaces available. • Decision made to stop integration of new developments until August 2007 to refactor code and rationalize dependencies.
Service Integration Policies • EGEE-II users need third-party products: • “Core” only provides low-level services. • To better meet the high-level service needs of applications. • Allow applications choice of several high-level services. • RESPECT: Recommended External Software Packages for EGEE Communities • Registry for useful, external software for EGEE scientists. • Final stages of approval within EGEE. • List will appear on the NA4 web site. • Developers must provide support and binary packages.
Application Families • Simulation • Bulk Processing • Responsive Apps. • Workflow • Parallel Jobs • Legacy Applications
Simulation • Examples • LHC Monte Carlo simulation • Fusion • WISDOM • Characteristics • Jobs are CPU-intensive • Large number of independent jobs • Run by few (expert) users • Small input; large output • Needs • Batch-system services • Minimal data management for storage of results ATLAS ITER
Virtual Screening Process • Docking: • Predict how small molecules bind to receptor with known 3D structure. • Projects: • Proteins@Home • Rosetta@home • Docking@Home • AFRICA@home • malariacontrol.net • WISDOM
WISDOM • WISDOM (http://wisdom.healthgrid.org/) • Developing new drugs for neglected and emerging diseases with a particular focus on malaria. • Reduced R&D costs for neglected diseases • Accelerated R&D for emerging diseases • Three large calculations: • WISDOM-I (Summer 2005) • Avian Flu (Spring 2006) • WISDOM-II (Autumn 2006) • WISDOM calculations used FlexX from BioSolveIT (3-6k free, floating licenses) in addition to Autodock.
Benefits from Grid • Computing Resources • Provided large amount of CPUs that normally would not have been available if it had to be bought. • Storage Resources • Ability to hook storage for results to grid. • Ability to make permanent backups of the data. • Tools • Job management tools to handle millions of jobs. • Tools for collecting and storing results from calculations. • Data management tools for collating the data and making it available to others. • Collaboration • Platform engendered new human collaboration and provides environment in which to share and analyze data efficiently.
Continued Analysis • WISDOM-I: Molecular dynamics • 5k best plasmepsin docking compounds are being reanalyzed using molecular dynamics codes • Need more “classic” parallel resources, either MPI on EGEE or use of supercomputers through DEISA • Avian Flu: • Top 5% of compounds will be refined through other methods • From top 5% of compounds: • structure cluster will be done for web lab assay • 50+ compounds will be assayed experimentally by (GRC, Academia Sinica, Taiwan) • WISDOM-II: • Post-docking filtering and analysis.
Bulk Processing • Examples • HEP processing of raw data, analysis • Earth observation data processing • Characteristics • Widely-distributed input data • Significant amount of input and output data • Needs • Job management tools (workload management) • Meta-data services • More sophisticated data management
Responsive Apps. (I) • Examples • Prototyping new applications • Monitoring grid operations • Direct interactivity • Characteristics • Small amounts of input and output data • Not CPU-intensive • Short response time (few minutes) • Needs • Configuration which allows “immediate” execution (QoS) • Services must treat jobs with minimum latency
Responsive Apps. (II) • Grid as a backend infrastructure: • gPTM3D: interactive analysis of medical images • GPS@: bioinformatics via web portal • DILIGENT: digital libraries • Volcano sonification • Characteristics • Rapid response: a human waiting for the result! • Many small but CPU-intensive tasks • User is not aware of “grid”! • Needs • Interfacing (data & computing) with non-grid application or portal • User and rights management between front-end and grid
gPTM3D • PTM3D: • Interactive analysis of 3D data for surgery planning and volumetric analysis. • Requires “guiding” from physician to find initial contours, work around noisy data, … • Needs unplanned, interactive access to significant computational resources.
Results • Speed-up gives response times acceptable to doctors. • Grid overhead doesn’t dominate for short calculations. • Requires application modifications to use with grid.
Workflow • Examples • “Bronze Standard”: image registration • Flood prediction • Characteristics • Use of grid and non-grid services • Complex set of algorithms for the analysis • Complex dependencies between individual tasks • Needs • Tools for managing the workflow itself • Standard interfaces for services (I.e. web-services)
Parallel Jobs • Examples • Climate modeling • Earthquake analysis • Computational chemistry • Characteristics • Many interdependent, communicating tasks • Many CPUs needed simultaneously • Use of MPI libraries • Needs • Configuration of resources for flexible use of MPI • Pre-installation of optimized MPI libraries
Legacy Applications • Examples • Commercial or closed source binaries • Geocluster: geophysical analysis software • FlexX: molecular docking software • Matlab, Mathematics, … • Characteristics • Licenses: control access to software on the grid • No recompilation no direct use of grid APIs! • Needs • License server and grid deployment model • Transparent access to data on the grid
Summary & Conclusions • Observe routine and large-scale use of the EGEE infrastructure by numerous, diverse set of user communities. • Present: • Grid is a collaborative platform: 10+ domains, 200+ VOs. • Grid enables sharing of resources and data for better science. • Future: • Responsiveness: Applications requiring quality-of-service. • Workflow: Use of different infrastructures, instruments. • Bigger role for third-party software for applications on grid.