1 / 34

The WorldGRID transatlantic testbed A successful example of Grid interoperability

The WorldGRID transatlantic testbed A successful example of Grid interoperability across EU and US domains  . Flavia Donno (Formerly of DataTAG WP4, LCG) Flavia.Donno@cern.ch http://chep03.ucsd.edu/files/249.ppt. CHEP 2003 – 24-28 March – n o . (1). Motivation Participants

carter
Download Presentation

The WorldGRID transatlantic testbed A successful example of Grid interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The WorldGRID transatlantic testbed A successful example of Grid interoperability across EU and US domains  Flavia Donno (Formerly of DataTAG WP4, LCG) Flavia.Donno@cern.ch http://chep03.ucsd.edu/files/249.ppt DataTag is a project funded by the European Union CHEP 2003 – 24-28 March – no. (1)

  2. Motivation • Participants • Interoperability issues • Solutions • Architecture • Monitoring/Support • Spin off • Applications • CMS • ATLAS • Monitoring with Nagios • Monitoring with Ganglia • Conclusions • Next Steps R. Gardner University of Chicago F. Donno CERN/IT and INFN Talk Outline

  3. CrossGrid: A. Garcia, M. Hardt, FZK - Germany J. Marco, UC - Spain M.David, J. Gomes, LIP - Portugal O. Maroney, U.Bristol, UK GriPhyN DataTAG: F. Donno, CERN - INFN S. Andreozzi, R. Barbera, V. Ciaschini, S. Fantinel, A. Ghiselli, M. Mazzucato, D. Rebatto, G. Tortone, L. Vaccarossa, M. Verlato, C. Vistoli – INFN M. Draoli, CNR-Rome PPDG Trillium/iVDGL: P. Avery, J. Rodriguez - U. Florida E. Deelman, N. Olomu - USC/ISI J. Gieraltowski, S. Gose, E. May, J. Schopf – Argonne Afaq, J. Annis, R. Glossum, R. Pordes, V. Sekrhi – Fermilab W. Deng, J. Smith, D. Yu - BNL A. DeSmit, A. Roy - Wisconsin C. Dumitrescu, I. Foster, R. Gardner, U. Chicago L. Grundhoefer ,J. Hicks, F. Luehring, L. Meehan - U. Indiana S. Youssef, Boston University B. Moe - Milwaukee D. Olson – LBNL S. Singh - Caltech iVDGL Participants

  4. Build a “transatlantic grid” based on the existent European and American Grids with the goal of offering transparent access to the distributed computing infrastructure necessary to the “data-intensive” modern applications Goal: Motivations • Basic collaboration between European and US Grid projects • Interoperability between Grid domains for applications submitted by users from different virtual organizations • Controlled use of shared resources subject to agreed policy • Integrated use of heterogeneous resources from iVDGL and DataGrid/CrossGrid testbed domains

  5. Interoperability Issues • Many grids with several OS (RH 6.2, RH 7.x, Fermi Linux, CERN Linux,…), several compilers and software components. • Different Grid Architectures (VDT server/client vs. Computing Elements, Storage Elements, User Interfaces, …) • Need to identify minimum set of core services and define collective/optional services Common protocols/Same or compatible versions of the software • Authentication and Authorization mechanism: authority trusting, user authentication/authorization via LDAP VO Servers. • Grid resource description/status: Globus schema vs. EDG schema vs. GLUE schema • Several Grid Data management Tools • Software distribution and configuration : rpm based vs. PACMAN

  6. Partition WorldGrid in subdomains with uniform or compatible set of basic services. Such resources will advertise themselves with specific targets to the applications (such as RH6.2). • Try to keep the subdomains as large as possible. Solutions • Many grids with several OS (RH 6.2, RH 7.x, Fermi Linux, CERN Linux,…), several compilers and software components.

  7. UI VDT Client RC SE RC RB IS IS CE VDT Server Solutions • Different Grid Architectures (VDT server/client vs. Computing Elements, Storage Elements, User Interfaces, …)

  8. Globus and Condor core services (GRAM, GSI, MDS, GridFTP, …) • Resource Broker, User Interface and JDL, Data Management high level tools (edg-replica-manager, MAGDA, Globus Replica Catalog, …) collective optional services not installed universally • User Grid Portals (Genius, GRAPPA, …): a variety available not to change the User Interface to the GRID Solutions • Need to identify minimum set of core services and define collective/optional services Common protocols/Same or compatible versions of the software

  9. DOE and EDG certificates universally accepted • DataTAG and iVDGL VO LDAP servers trusted • mkgridmap tool universally installed • Local security policy sites agreed to allow access to grid demonstration users (kerberos, …) Solutions • Authentication and Authorization mechanism: authority trusting, user authentication/authorization via LDAP VO Servers.

  10. three coexistent schemas in place (Globus, EDG, GLUE) installed on all resources • Some tool (monitoring) working with all of them • EDG middleware using both EDG and GLUE • US tools using none or Globus Solutions • Grid resource description/status: Globus schema vs. EDG schema vs. GLUE schema

  11. Created WorldGrid distribution (rpm/LCFGng and PACMAN) • Effort to ensure coherency and automatic configuration Solutions • Software distribution and configuration : rpm based vs. PACMAN

  12. UI VDT Client RC SE IS RB CE VDT Server Final Architecture

  13. Monitoring and Support • Two monitoring tools VO based in place: edt-monitor based on Nagios and iVDGL based on Ganglia (see talk from R. Gardner) • Support infrastructure: to support site administrators during the installation and configuration procedure. Also for problem fixing during normal operation

  14. Spin-off • GLUE schema:WorldGrid has allowed to prove the validity of the GLUE schema and encouraged EDG to deploy it • VOMS:The authentication/authorization problems were identified and parallel research activities started, like the one on Virtual Organization Manager Service • GLUE Packaging:A working group is trying to find a solution for a standardization of the packaging, distribution and configuration problem for a software release • GLUE Testing:The problem of verifying an installation and validate a site for joining the Grid has been addressed and a working group has started • Support:A first operation/monitoring center has started in US taking advantage of the monitoring tools. Other centers in EU • LCG-0:After the demonstration at IST2002 and SC2002, LCG has based his first middleware distribution on the WorldGrid experience

  15. The WorldGRID transatlantic testbed, Part 2 A successful example of Grid interoperability across EU and US domains  Rob Gardner University of Chicago on behalf of the WG group DataTag is a project funded by the European Union

  16. Motivation • Participants • Interoperability issues • Solutions • Architecture • Monitoring/Support • Spin off • Applications • CMS • ATLAS • Monitoring with Nagios • Monitoring with Ganglia • Conclusions • Next Steps R. Gardner University of Chicago F. Donno CERN/IT and INFN Part 2 Talk Outline

  17. Installing Apps on 2 Grids • We needed a way to get applications from three experiments (VO’s) setup on the execution sites • On DataTAG resources, selected CE’s were loaded with CMS or ATLAS rpms • On iVDGL resources, we Pacmanized binaries (rpms and tarballs) of bundled applications • %pacman –get iVDGL:ScienceGrid • Atlas-kit, Atlas-ATLFAST • CMS-MOP, EDG-CMS • SDSS Astrotools • binaries, and run time environments 3 experiments

  18. https+java/xml+rfb WEB Browser GENIUS Local WS EnginFrame Apache EDG UI the Grid EDG+GSI ATLAS and CMS with GENIUS Grid Storage Input Data Read from Grid Storage Element ATLSIM Job Output ZEBRA Write to Grid Storage Element see R. Barbera’s Genius talk this conference

  19. GENIUS UI SE see WorldGrid Poster this conf. Executable = "/usr/bin/env"; Arguments = "zsh prod.dc1_wrc 00001"; VirtualOrganization="datatag"; Requirements=Member(other.GlueHostApplicationSoftware RunTimeEnvironment,"ATLAS-3.2.1" ); Rank = other.GlueCEStateFreeCPUs; InputSandbox={"prod.dc1_wrc",“rc.conf","plot.kumac"}; OutputSandbox={"dc1.002000.test.00001.hlt.pythia_jet_17.log","dc1.002000.test.00001.hlt.pythia_jet_17.his","dc1.002000.test.00001.hlt.pythia_jet_17.err","plot.kumac"}; ReplicaCatalog="ldap://dell04.cnaf.infn.it:9211/lc=ATLAS,rc=GLUE,dc=dell04,dc=cnaf,dc=infn,dc=it"; InputData = {"LF:dc1.002000.evgen.0001.hlt.pythia_jet_17.root"}; StdOutput = " dc1.002000.test.00001.hlt.pythia_jet_17.log"; StdError = "dc1.002000.test.00001.hlt.pythia_jet_17.err"; DataAccessProtocol = "file"; JDL GLUE-aware files JDL input data location WorldGrid Testbed RB/JSS II Replica Catalog TOP GIIS GLUE-Schema based Information System Job data registration CE . . . WN ATLAS sw

  20. CMS Applications • Monte Carlo Production chain on Grid • CMKIN: generation physics events with PYTHIA • CMSIM: simulation of the detector with GEANT3 • CMS production software installed in the WN’s • Job workflow and data management • CMKIN jobs sent by the RB to WN with CMS software, store the output at nearby SE • register LFN to the RC • CMSIM jobs sent by the RB to WN nearby SE • Register LFN to the RC

  21. ATLAS Applications • Grappa and Genius submissions • ATLAS Detector Simulations • Simulation of the detector response using ATLSIM (GEANT3) • Based on DC1 Grid script • ATLAS production software installed in the WN’s

  22. see D. Engh this conf. Grappa and ATLAS Script interface Web browser interface Cactus framework https input files Java CoG submission,monitoring Grappa Portal Engine Storage Elements: - Disk/HPSS . . . MAGDA: replica and metadata Resource A Resource Z Compute Elements

  23. Job Submission Animation

  24. VO Monitoring • Initial Requirements: • Grid-level resource activity, utilization, and performance monitoring; • VO-level resource activity and resource utilization monitoring; • Customized views: • Hardware resources (clusters, sites, grids); • VO usages, jobs, work-types; • Design Goals: • Scalability over large number of resources and networks; • Simplicity and distributed architecture; • Two approaches • iVDGL: built on popular Ganglia resource monitoring package from UC Berkeley • DataTAG: built on popular Nagios package http://www.nagios.org/

  25. RRDB Tool RRDB Tool Site a Site b gmond gmond gmond gmond gmond gmond gmond gmond VO Ganglia Web php client iVDGL Round Robin DB Tool Grid Aggregation DataTAG Logging & Bookeeping UI RB JSS CE

  26. Site Level VO Usage and Policy

  27. VO Nagios Monitoring • based on Nagios (a host and service monitoring engine) [detailed information on: http://www.nagios.org] • host local plug-ins – collect info from OS - CPU load - RAM - disk - jobs • MDS plug-ins - collect aggregate info from GRIS - number of running/waiting jobs - number of total/free CPUs • history graphs for all monitoring metrics • aggregate info/graphs per Site and Virtual Organization

  28. Status and Summary Map 3-level status map grid-aggregate monitors

  29. VO Usage Graphs site and aggregated montiors MDS collected see G. Tortone et. al., this conference

  30. WorldGrid Next Steps • New developments in DataTAG: • Test/experiment with SRM solution for Storage Element access (multiple implementations of the protocol) • Test/experiment with advanced Data Management tools such as Globus-EDG/RLS • Propose alternative Grid Resource Discovery mechanisms based on WEB services • Improve the monitoring tools taking advantage of OGSA • Develop a WorldGrid GOC, coordinated operations centers • Continue themes in iVDGL: • site-friendly installations, untouched by humans • multi-VO (controlled use of shared resources) • pursue concept of ‘projects’

  31. A project consists of • A (typically small) list of distinguished names or VO(s). • Email and phone contact. • A software environment expressed as a Pacman package. • Local disk space requirements. • A url describing the project. Projects as unit of access • Basic site management operations: • Join a project • Leave a project • Pause a project Site manager commands

  32. Example Site Manager Commands % worldgrid –info -join <project> -leave <project> -pause <project> -kill <project> -update <project> -getCA <CA> -setForum <URL>

  33. WorldGrid iVDGL FAQ Forum Help Batch jobs History Joined projects Demo ATLASDC2-higgs ChimeraTest8 Projects Certified Performance Installed Software Demo CMS-DC2-SUSY ChimeraTest8 ChimeraTest9 ATLASDC2-higgs SDSC-scan45 WorldGrid ScienceGrid ProjectAccess CAs 10/150 G used in WorkSpace

  34. Conclusions • Lessons from WorldGrid 2002 • Grid building • Packaging and configuration key • GLUE meta-packaging study launched, report available • Testing and site validation • Interoperability • Configuration of common MDS schema allowed joint use of VDT and EDG middleware installations • good experience for LCG • Integrate two very different grids • “Top down” EDG-style of Grids with high level services • “Bottoms up” VDT-style grids providing core services with • Transatlantic cooperation can be fun!

More Related