320 likes | 416 Views
CEOS March 2005 Argentina. GridAssist, making the Grid invisible. Ruud Grim Mark ter Linden. Ivan Petiteville. Contents. History Technical Details Operational Experiences Future Plans. A user friendly service to support instrument calibration/validation & data (re-) processing.
E N D
CEOS March 2005 Argentina GridAssist, making the Grid invisible Ruud Grim Mark ter Linden Ivan Petiteville
Contents • History • Technical Details • Operational Experiences • Future Plans A user friendly service to support instrument calibration/validation & data (re-) processing. GridAssist, March 2005 CEOS Argentina
History • 1997-2000 EC FP4 OASE project • Collaboration environments for the simulation and data processing of Earth Observation data • Chains of applications in distributed environment • Used CORBA technology provided only limited functionality and was not properly secure (opening of ports in firewall needed) Atmosphere Model OMI Simulator Ground Data Processor Total Ozone Column UV Prediction Dutch Space Dutch Space DLR-DFD KNMI FMI GridAssist, March 2005 CEOS Argentina
GREASE Project2002-2003 (ESA) • Same concept, with new chassis (Grid) and powered by new engine (Globus Toolkit 2.x) • The environment should be easy to use and should hide the underlying Grid technology for the scientific user • Workflow and service oriented approach – more than simple chains of applications. Service A Service D Service F Service B Service C Service E GridAssist, March 2005 CEOS Argentina
Grid resources Workstations with client tools Controller Concept • User friendly client tools run locally on the users workstations for constructing workflows and monitoring jobs • Centralized controller executes the workflows on the Grid • Controller implemented as Web Service for easy and standardized access (even through firewalls) SOAP Grid LAN GridAssist, March 2005 CEOS Argentina
Use cases within ESA • Instrument validation • Mission simulation • Archive reprocessing • Instrument test data generation (via simulation) • Production-on-Demand • Concurrent design Satisfying different functional needs: • Collaboration • Computing power • Controlled provision & access of services GridAssist, March 2005 CEOS Argentina
Grid implementations @ESA • Instrument validation (#3) • Mission simulation (#2) • Archive reprocessing • Instrument test data generation (#1) • Production-on-Demand • Concurrent design Examples (#) • OMI test data generation • ENVISAT validation • GAIA mission analysis & Grid-on-Demand Concurrent Design Facility GridAssist, March 2005 CEOS Argentina
UC#1: OMI (NASA AURA) (launched summer 2004) Electronic Assembly Optical Assembly • Main products: Ozone columns, profiles • 6-7 GB / day (Level 0 data) GridAssist, March 2005 CEOS Argentina
UC#1: Scanning the Earth daily • Continue global total ozone trends • Nominal 13 x 24 km spatial resolution or 13 x 13 km for detecting and tracking urban-scale pollution sources GridAssist, March 2005 CEOS Argentina
UC#1: Test data generation • Fall 2003: Generation of one month of simulated OMI data for Ground Segment Verification (starting beginning 2004) • 230,000 simulation runs of 2 minutes each (total 7666 hours) • Between 50 and 80 CPU’s were used in a 6 week period • 32 Gb telemetry data produced and transferred to NASA spectrum NASA GS CCD output telemetry Existing GOME Data Level 0 OMI Instr. Simulator Level 1 Raw Data Generator Level 0 Processor Level 1b Processor Grid Level 2 Algorithm GridAssist, March 2005 CEOS Argentina
UC#2: Instrument ValidationWhat is required? • Additional validated data • In-situ measurements • Aircraft • Balloon • Ground (lidar) • Other space instrumentation • Quality Assurance • Common data sets • Algorithms • Tools, converters, visualization tools • Good communication & collaboration GridAssist, March 2005 CEOS Argentina
UC#2: ECV Prototype(ESA THE VOICE project) Demonstrate possibilities of e-Collaboration for cal / val • Authorization & Authentication • Communication (agenda, documentation) • Access to • Meta data catalogue • Data store • Applications & tools • Under configuration control • In development • Workflow Management (GridAssist) • Publish & Subscribe GridAssist, March 2005 CEOS Argentina
UC#2: Validation Workflow • Access to data stores • GOME Level 2 • LIDAR (at IPSL or NILU) • On-demand processing • Publish/Subscribe tonotify users GridAssist, March 2005 CEOS Argentina
UC#2 THE VOICE Workflow Environment Workflow submission Connecting Click-and-Drop Data stores Applications Access to Data stores Drag-and-Drop GridAssist, March 2005 CEOS Argentina
UC#2: VOICE collaboration crossing boundaries NILU ESTEC & Dutch Space KNMI RIVM Univ Bremen BIRA/IASB NASA IPSL Genève Tor Vergata ESRIN GridAssist, March 2005 CEOS Argentina ESAC VillaFranca
UC#3: Gaia mission analysisScience objectives • Map 10^9 stars in our Galaxy • Astrometry • Photometry • Spectra • Studies • Structure & kinematics of Galaxy • Stellar populations • Origin, formation & evolution of Galaxy • Stellar astrophysics • Cosmology • Extra-solar planetary science • Fundamental physics • Core Processing (Global Iterative Solution) using subset of 10^8 stars with • Raw data • Calibrated data • Attitude data • Science data • 500 TB over 5 yr • 10^20 flop CPU GridAssist, March 2005 CEOS Argentina
UC#3: Gaia ProcessingForeseen architecture (May 2004) GridAssist, March 2005 CEOS Argentina
Binary star simulation with the GASS (Gaia Simulator) 5 year period, submitted as 5 jobs covering 1 year each Executed on 23 CPU’s in 8 institutes of 5 countries Total of 3.8 million CPU seconds used 16.5 Gb telemetry data produced and transferred to CESCA >1,100 jobs submitted in 6 months Data extraction from GDAAS database (Oracle) Very flexible using Java as query language UC#3: GAIA collaboration Lund Astrometry ESTEC Dutch Space Copenhague Cambridge Photometry Leiden Photometry RVS Heidelberg Quick Looks Bruxelles ABS Meudon RVS Turino Minor Planets Geneve Variable Stars Trieste RVS CNES? Nice Fundamental Algos ESRIN ESAC Barcelona Core Tasks Database GridAssist, March 2005 CEOS Argentina
Benefits of GridAssist • Easy and secure access to applications, data and resources • Satisfying both collaboration & HPC needs • Unattended execution of large and/or complex jobs using workflows • Low failure rate (>95% of jobs are successfully completed) • Supports loggingat three levels • Application, GridAssist, Globus • No or little modifications needed to existing applications; new applications can be added fast • The Grid environment can easily be extended with more resources • Easiness of installation GridAssist, March 2005 CEOS Argentina
Lessons Learned • The GridAssist Workflow Tool proved to be a very user-friendly and intuitive tool; users can use it almost directly • It complies to both High Performance Computing and collaboration needs within ESA; users are very enthusiastic • Interface problems between applications can be detected early in the development process • Approach to use GridAssist to run applications on the Grid is usable for many fieldsthat have similar scientific data processing needs (Earth Observation, Astronomy, …?) GridAssist, March 2005 CEOS Argentina
Future plans • Continue development • Improve robustness • Improved workflow features, user management • Improved access to data stores • Interoperability (e.g. gLite) • Project operations support • Mission analysis • Instrument calibration / validation • Application development • Level 3 & 4 product processing • Archive re-processing GridAssist, March 2005 CEOS Argentina
More info? • Web site: http://www.gridassist.com/ • Contact persons: • Ivan Petiteville (ESA ESRIN) e-mail: Ivan.Petiteville@esa.int telephone: +39-06.941.80.567 • Ruud Grim (GridAssist Project Manager)e-mail: r.grim@dutchspace.nltelephone: +31-71-5.245.416 • Mark ter Linden (GridAssist Developer)e-mail: m.ter.linden@dutchspace.nltelephone: +31-71-5.245.557 • Photos: courtesy ESA, NASA, KNMI and Internet GridAssist, March 2005 CEOS Argentina
+ + Develop locally, compute and collaborate globally on the Grid. Questions ? GridAssist, March 2005 CEOS Argentina
The Grid • Around 1998 the Grid concept was introduced:Sharing resources in Virtual Organizations • Demand driven access to computing power • Increased utilization of idle capacity • Greater sharing of computational results GridAssist, March 2005 CEOS Argentina
Grid Environment • Grid environment based on Globus Toolkit 2.x using: • Globus Resource Allocation and Management (GRAM) • Remote job submission and control • Interface to local job management systems (PBS, LSF, Condor) • GridFTP • High performance, secure, reliable data transfer • Grid Security Infrastructure (GSI) • Single sign-on and secure communication • Based on Public Key encryption and X.509 certificates GridAssist, March 2005 CEOS Argentina
Features • Workflow Tool • User interface implemented in Java (Windows, Linux, Unix, Mac) • To add / modify / remove applications, resources and properties • To create, start and monitor workflows • Embed additional (new) services, e.g. browsing in database, logging at 3 levels, converters, notification services, visualization • Embed batch programs, not (yet) interactive • No requirements on language (Java, Fortran, C, IDL, …). • User can configure runtime parameters • Central registry • Storage of information about applications and resources • Configuration control GridAssist, March 2005 CEOS Argentina
MySQL Database Architecture • Implementation in Java – cross platform (tested on Windows, Linux and Mac) Globus specific protocols SOAP Grid Grid LAN Data Processing Application Apache Jakarta Tomcat Web Server JDBC Connector Apache AXIS Apache AXIS GridAssist Workflow Engine Java CoG-kit Globus Toolkit GridAssist Workflow Tool Controller Grid Resource User Workstation GridAssist, March 2005 CEOS Argentina
Workflow ToolMaintaining the registry Resources Resource or service details Services GridAssist, March 2005 CEOS Argentina
Workflow ToolCreating the workflow Workflow submission Data stores Connecting Click-and-Drop Applications Drag-and-Drop GridAssist, March 2005 CEOS Argentina
Workflow ToolStatus Monitoring Availability & Usage Submitted workflows & status overview GridAssist, March 2005 CEOS Argentina
Hiding Grid technologyIntuitive GUI preferred DAG structured Dynamic execution Fault tolerance build-in GridAssist, March 2005 CEOS Argentina
Data Processing Applications • Batch programs, not interactive. • No requirements on language (Java, Fortran, C, IDL, …). • Applications do not have to be modified. • Applications can be configured by the user using runtime parameters. • A simple wrapper shell script can be written to handle the input, output and the runtime parameters. • The application itself can be stored on the Grid resource but also on a storage node (in this case only the wrapper script need to be present on the Grid resource). GridAssist, March 2005 CEOS Argentina