340 likes | 455 Views
The WorldGRID transatlantic testbed A successful example of Grid interoperability across EU and US domains . Flavia Donno (Formerly of DataTAG WP4, LCG) Flavia.Donno@cern.ch http://chep03.ucsd.edu/files/249.ppt. CHEP 2003 – 24-28 March – n o . (1). Motivation Participants
E N D
The WorldGRID transatlantic testbed A successful example of Grid interoperability across EU and US domains Flavia Donno (Formerly of DataTAG WP4, LCG) Flavia.Donno@cern.ch http://chep03.ucsd.edu/files/249.ppt DataTag is a project funded by the European Union CHEP 2003 – 24-28 March – no. (1)
Motivation • Participants • Interoperability issues • Solutions • Architecture • Monitoring/Support • Spin off • Applications • CMS • ATLAS • Monitoring with Nagios • Monitoring with Ganglia • Conclusions • Next Steps R. Gardner University of Chicago F. Donno CERN/IT and INFN Talk Outline
CrossGrid: A. Garcia, M. Hardt, FZK - Germany J. Marco, UC - Spain M.David, J. Gomes, LIP - Portugal O. Maroney, U.Bristol, UK GriPhyN DataTAG: F. Donno, CERN - INFN S. Andreozzi, R. Barbera, V. Ciaschini, S. Fantinel, A. Ghiselli, M. Mazzucato, D. Rebatto, G. Tortone, L. Vaccarossa, M. Verlato, C. Vistoli – INFN M. Draoli, CNR-Rome PPDG Trillium/iVDGL: P. Avery, J. Rodriguez - U. Florida E. Deelman, N. Olomu - USC/ISI J. Gieraltowski, S. Gose, E. May, J. Schopf – Argonne Afaq, J. Annis, R. Glossum, R. Pordes, V. Sekrhi – Fermilab W. Deng, J. Smith, D. Yu - BNL A. DeSmit, A. Roy - Wisconsin C. Dumitrescu, I. Foster, R. Gardner, U. Chicago L. Grundhoefer ,J. Hicks, F. Luehring, L. Meehan - U. Indiana S. Youssef, Boston University B. Moe - Milwaukee D. Olson – LBNL S. Singh - Caltech iVDGL Participants
Build a “transatlantic grid” based on the existent European and American Grids with the goal of offering transparent access to the distributed computing infrastructure necessary to the “data-intensive” modern applications Goal: Motivations • Basic collaboration between European and US Grid projects • Interoperability between Grid domains for applications submitted by users from different virtual organizations • Controlled use of shared resources subject to agreed policy • Integrated use of heterogeneous resources from iVDGL and DataGrid/CrossGrid testbed domains
Interoperability Issues • Many grids with several OS (RH 6.2, RH 7.x, Fermi Linux, CERN Linux,…), several compilers and software components. • Different Grid Architectures (VDT server/client vs. Computing Elements, Storage Elements, User Interfaces, …) • Need to identify minimum set of core services and define collective/optional services Common protocols/Same or compatible versions of the software • Authentication and Authorization mechanism: authority trusting, user authentication/authorization via LDAP VO Servers. • Grid resource description/status: Globus schema vs. EDG schema vs. GLUE schema • Several Grid Data management Tools • Software distribution and configuration : rpm based vs. PACMAN
Partition WorldGrid in subdomains with uniform or compatible set of basic services. Such resources will advertise themselves with specific targets to the applications (such as RH6.2). • Try to keep the subdomains as large as possible. Solutions • Many grids with several OS (RH 6.2, RH 7.x, Fermi Linux, CERN Linux,…), several compilers and software components.
UI VDT Client RC SE RC RB IS IS CE VDT Server Solutions • Different Grid Architectures (VDT server/client vs. Computing Elements, Storage Elements, User Interfaces, …)
Globus and Condor core services (GRAM, GSI, MDS, GridFTP, …) • Resource Broker, User Interface and JDL, Data Management high level tools (edg-replica-manager, MAGDA, Globus Replica Catalog, …) collective optional services not installed universally • User Grid Portals (Genius, GRAPPA, …): a variety available not to change the User Interface to the GRID Solutions • Need to identify minimum set of core services and define collective/optional services Common protocols/Same or compatible versions of the software
DOE and EDG certificates universally accepted • DataTAG and iVDGL VO LDAP servers trusted • mkgridmap tool universally installed • Local security policy sites agreed to allow access to grid demonstration users (kerberos, …) Solutions • Authentication and Authorization mechanism: authority trusting, user authentication/authorization via LDAP VO Servers.
three coexistent schemas in place (Globus, EDG, GLUE) installed on all resources • Some tool (monitoring) working with all of them • EDG middleware using both EDG and GLUE • US tools using none or Globus Solutions • Grid resource description/status: Globus schema vs. EDG schema vs. GLUE schema
Created WorldGrid distribution (rpm/LCFGng and PACMAN) • Effort to ensure coherency and automatic configuration Solutions • Software distribution and configuration : rpm based vs. PACMAN
UI VDT Client RC SE IS RB CE VDT Server Final Architecture
Monitoring and Support • Two monitoring tools VO based in place: edt-monitor based on Nagios and iVDGL based on Ganglia (see talk from R. Gardner) • Support infrastructure: to support site administrators during the installation and configuration procedure. Also for problem fixing during normal operation
Spin-off • GLUE schema:WorldGrid has allowed to prove the validity of the GLUE schema and encouraged EDG to deploy it • VOMS:The authentication/authorization problems were identified and parallel research activities started, like the one on Virtual Organization Manager Service • GLUE Packaging:A working group is trying to find a solution for a standardization of the packaging, distribution and configuration problem for a software release • GLUE Testing:The problem of verifying an installation and validate a site for joining the Grid has been addressed and a working group has started • Support:A first operation/monitoring center has started in US taking advantage of the monitoring tools. Other centers in EU • LCG-0:After the demonstration at IST2002 and SC2002, LCG has based his first middleware distribution on the WorldGrid experience
The WorldGRID transatlantic testbed, Part 2 A successful example of Grid interoperability across EU and US domains Rob Gardner University of Chicago on behalf of the WG group DataTag is a project funded by the European Union
Motivation • Participants • Interoperability issues • Solutions • Architecture • Monitoring/Support • Spin off • Applications • CMS • ATLAS • Monitoring with Nagios • Monitoring with Ganglia • Conclusions • Next Steps R. Gardner University of Chicago F. Donno CERN/IT and INFN Part 2 Talk Outline
Installing Apps on 2 Grids • We needed a way to get applications from three experiments (VO’s) setup on the execution sites • On DataTAG resources, selected CE’s were loaded with CMS or ATLAS rpms • On iVDGL resources, we Pacmanized binaries (rpms and tarballs) of bundled applications • %pacman –get iVDGL:ScienceGrid • Atlas-kit, Atlas-ATLFAST • CMS-MOP, EDG-CMS • SDSS Astrotools • binaries, and run time environments 3 experiments
https+java/xml+rfb WEB Browser GENIUS Local WS EnginFrame Apache EDG UI the Grid EDG+GSI ATLAS and CMS with GENIUS Grid Storage Input Data Read from Grid Storage Element ATLSIM Job Output ZEBRA Write to Grid Storage Element see R. Barbera’s Genius talk this conference
GENIUS UI SE see WorldGrid Poster this conf. Executable = "/usr/bin/env"; Arguments = "zsh prod.dc1_wrc 00001"; VirtualOrganization="datatag"; Requirements=Member(other.GlueHostApplicationSoftware RunTimeEnvironment,"ATLAS-3.2.1" ); Rank = other.GlueCEStateFreeCPUs; InputSandbox={"prod.dc1_wrc",“rc.conf","plot.kumac"}; OutputSandbox={"dc1.002000.test.00001.hlt.pythia_jet_17.log","dc1.002000.test.00001.hlt.pythia_jet_17.his","dc1.002000.test.00001.hlt.pythia_jet_17.err","plot.kumac"}; ReplicaCatalog="ldap://dell04.cnaf.infn.it:9211/lc=ATLAS,rc=GLUE,dc=dell04,dc=cnaf,dc=infn,dc=it"; InputData = {"LF:dc1.002000.evgen.0001.hlt.pythia_jet_17.root"}; StdOutput = " dc1.002000.test.00001.hlt.pythia_jet_17.log"; StdError = "dc1.002000.test.00001.hlt.pythia_jet_17.err"; DataAccessProtocol = "file"; JDL GLUE-aware files JDL input data location WorldGrid Testbed RB/JSS II Replica Catalog TOP GIIS GLUE-Schema based Information System Job data registration CE . . . WN ATLAS sw
CMS Applications • Monte Carlo Production chain on Grid • CMKIN: generation physics events with PYTHIA • CMSIM: simulation of the detector with GEANT3 • CMS production software installed in the WN’s • Job workflow and data management • CMKIN jobs sent by the RB to WN with CMS software, store the output at nearby SE • register LFN to the RC • CMSIM jobs sent by the RB to WN nearby SE • Register LFN to the RC
ATLAS Applications • Grappa and Genius submissions • ATLAS Detector Simulations • Simulation of the detector response using ATLSIM (GEANT3) • Based on DC1 Grid script • ATLAS production software installed in the WN’s
see D. Engh this conf. Grappa and ATLAS Script interface Web browser interface Cactus framework https input files Java CoG submission,monitoring Grappa Portal Engine Storage Elements: - Disk/HPSS . . . MAGDA: replica and metadata Resource A Resource Z Compute Elements
VO Monitoring • Initial Requirements: • Grid-level resource activity, utilization, and performance monitoring; • VO-level resource activity and resource utilization monitoring; • Customized views: • Hardware resources (clusters, sites, grids); • VO usages, jobs, work-types; • Design Goals: • Scalability over large number of resources and networks; • Simplicity and distributed architecture; • Two approaches • iVDGL: built on popular Ganglia resource monitoring package from UC Berkeley • DataTAG: built on popular Nagios package http://www.nagios.org/
RRDB Tool RRDB Tool Site a Site b gmond gmond gmond gmond gmond gmond gmond gmond VO Ganglia Web php client iVDGL Round Robin DB Tool Grid Aggregation DataTAG Logging & Bookeeping UI RB JSS CE
VO Nagios Monitoring • based on Nagios (a host and service monitoring engine) [detailed information on: http://www.nagios.org] • host local plug-ins – collect info from OS - CPU load - RAM - disk - jobs • MDS plug-ins - collect aggregate info from GRIS - number of running/waiting jobs - number of total/free CPUs • history graphs for all monitoring metrics • aggregate info/graphs per Site and Virtual Organization
Status and Summary Map 3-level status map grid-aggregate monitors
VO Usage Graphs site and aggregated montiors MDS collected see G. Tortone et. al., this conference
WorldGrid Next Steps • New developments in DataTAG: • Test/experiment with SRM solution for Storage Element access (multiple implementations of the protocol) • Test/experiment with advanced Data Management tools such as Globus-EDG/RLS • Propose alternative Grid Resource Discovery mechanisms based on WEB services • Improve the monitoring tools taking advantage of OGSA • Develop a WorldGrid GOC, coordinated operations centers • Continue themes in iVDGL: • site-friendly installations, untouched by humans • multi-VO (controlled use of shared resources) • pursue concept of ‘projects’
A project consists of • A (typically small) list of distinguished names or VO(s). • Email and phone contact. • A software environment expressed as a Pacman package. • Local disk space requirements. • A url describing the project. Projects as unit of access • Basic site management operations: • Join a project • Leave a project • Pause a project Site manager commands
Example Site Manager Commands % worldgrid –info -join <project> -leave <project> -pause <project> -kill <project> -update <project> -getCA <CA> -setForum <URL>
WorldGrid iVDGL FAQ Forum Help Batch jobs History Joined projects Demo ATLASDC2-higgs ChimeraTest8 Projects Certified Performance Installed Software Demo CMS-DC2-SUSY ChimeraTest8 ChimeraTest9 ATLASDC2-higgs SDSC-scan45 WorldGrid ScienceGrid ProjectAccess CAs 10/150 G used in WorkSpace
Conclusions • Lessons from WorldGrid 2002 • Grid building • Packaging and configuration key • GLUE meta-packaging study launched, report available • Testing and site validation • Interoperability • Configuration of common MDS schema allowed joint use of VDT and EDG middleware installations • good experience for LCG • Integrate two very different grids • “Top down” EDG-style of Grids with high level services • “Bottoms up” VDT-style grids providing core services with • Transatlantic cooperation can be fun!