240 likes | 441 Views
U.S. ATLAS Grid Testbed Status and Plans. Kaushik De University of Texas at Arlington DoE/NSF Mid-term Review NSF Headquarters, June 2002. Outline. Testbed Phase 2 launched: UTA Workshop http://heppc1.uta.edu/atlas/workshop_april_2002/index.html
E N D
U.S. ATLAS Grid Testbed Status and Plans Kaushik De University of Texas at Arlington DoE/NSF Mid-term Review NSF Headquarters, June 2002
Outline • Testbed Phase 2 launched: UTA Workshop http://heppc1.uta.edu/atlas/workshop_april_2002/index.html • New focus on rapid software deployment and grid based data production leading to demonstrations at Supercomputing 2002 • Kaushik De coordinating U.S. Testbed and SC2002 planning since mid-April 2002 • This talk based on new & evolving plans • Testbed status • Software distribution • Application toolkit • MC production plans • Monitoring • Grid tools • Integration • SC2002 demos Kaushik De DoE/NSF Review
Testbed Goals • Demonstrate success of grid computing model for High Energy Physics • in data production • in data access • in data analysis • Develop, deploy and test grid middleware and applications • integrate middleware with applications • simplify deployment - robust, rapid & scalable • inter-operate with other testbeds & grid organizations (iVDGL, DataTag…) • provide single point-of-service for grid users • Evolve into fully functioning scalable distributed tiered grid Kaushik De DoE/NSF Review
Testbed Website • http://heppc1.uta.edu/atlas/grid-testbed/index.htm Kaushik De DoE/NSF Review
Grid Testbed Sites U Michigan Lawrence Berkeley National Laboratory Boston University Argonne National Laboratory Brookhaven National Laboratory Indiana University Oklahoma University University of Texas at Arlington US -ATLAS testbed launched February 2001 Kaushik De DoE/NSF Review
Testbed Fabric • 8 production gatekeepers - ANL, BNL, LBNL, BU, IU, UM, OU, UTA • http://heppc1.uta.edu/atlas/grid-testbed/testbed-sites.htm • Large clusters at BNL, LBNL, IU, UTA, BU • BNL: RCF, LBNL: PDSF, IU/BU: prototype Tier 2 • UTA awarded NSF MRI for acquisition of D0 & ATLAS grid facility ($950k+$400k) - Thanks! • + Multiple R&D gatekeepers • gremlin@bnl - iVDGL GIIS • heppc5@uta - ATLAS hierarchical GIIS • atlas10/14@anl - EDG testing • heppc6@uta+gremlin@bnl - glue schema • heppc17/19@uta - GRAT development • few sites - Grappa portal • bnl - VO server • few sites - iVDGL testbed Kaushik De DoE/NSF Review
Software Distribution • Jason Smith, Kaushik De, Saul Youssef, Wensheng Deng, Shava Smallen • Goals: • Easy installation by System Administrators • Uniform software versions • Pacman perfect for this task • First stage deployment • Done - May, 2002 • Pacman, Globus 2.0b, cernlib • GRAT application/production package • Second stage deployment • Magda, Grappa - June, 2002 • Tools for distributed production • Third stage • VDT 1.1.1, Chimera, … - July/August, 2002 Kaushik De DoE/NSF Review
Available Packages Kaushik De DoE/NSF Review
Applications Team • Horst Severini, Kaushik De, Dan Engh, Wensheng Deng, Ed May • Goal: enable physicist to use testbed without worrying about underlying middleware or ATLAS software • Athena-Atlfast for grid testbed • Tool 1: runs on any globus enabled node (requires transfer of ~17MB executable package) • Tool 2: runs on grid site where executable package has been preinstalled • Tool 3: runs on afs enabled sites (the latest version of software is built and used) • GRid Applications Toolkit: GRAT • Above plus grid tools - ver 0.1 released 4/12/02 • tested successfully on 17 U.S. ATLAS gatekeepers, CMS gatekeeper, D0 gatekeeper, EDG CE node (RH 6.x and RH 7.x), ... • Version 0.3 of GRAT released May 8, 2002 • Next, add Magda+ & merge with Grappa Kaushik De DoE/NSF Review
GRAT v 0.3 • Script based toolkit. Merging now with Grappa visual GUI tool (see Gardner talk) Kaushik De DoE/NSF Review
Testbed Production • Goals: • Demonstrate distributed ATLAS data production, access and analysis using grid middleware and tools developed by the testbed group • Plans: • Atlfast production to test middleware and tools, and produce physics data for summer students, based on athena-atlfast, using VDT+Magda +Chimera and both GRAT and Grappa • 2 weeks to regenerate data, once a month • deploy new tools and middleware each cycle • move away from farm paradigm to grid model • very aggressive schedule - people limited! • DC1 production to test fabric capabilities and produce and access data, using old Fortran code atlsim, atrig and atrecon (see previous talks) • not repeatable - hard to actively test grid software • increase U.S. participation - involve grid testbed Kaushik De DoE/NSF Review
Atlfast Production • Application: Athena-atlfast • Current version 3.0.1. Next release will be 3.2.0 (official DC1 release) • Middleware: VDT+Magda+Chimera • Interface: GRAT, Grappa • Sites: 8 ATLAS testbed sites, 2 CMS testbed sites, 2 D0 MC farms, EDG sites? TeraGrid sites? • June, 2002: Phase Alpha • Demonstrate software deployment and simple production system done Kaushik De DoE/NSF Review
Summer Schedule • July 1-15: Phase 0, 10^7 events • Globus 2.0 beta, Athena 3.0.1, Grappa, common disk model, Magda, 5 physics processes, BNL VO manager, minimal job scheduler, GridView monitoring • August 5-19: Phase 1, 10^8 events • VDT 1.1.1, Hierarchical GIIS server, Athena-atlfast 3.2.0, Grappa, Magda - data & replica management with metadata catalogue, 10 physics processes, static MDS based job scheduler, new visualization • September 2-16: Phase 2, 10^9 events, 1 TB storage, 40k files • Athena-atlfast 3.2.0 instrumented, 20 physics processes, upgraded BNL VO manager, dynamic job scheduler, fancy monitoring • Need some planning of analysis tools Kaushik De DoE/NSF Review
Compute Sites Atlfast Production Architecture Boxed Athena-Atlfast Storage Elements MDS Globus Resource Broker Magda VDC • JobOptions: • Higgs • SUSY • QCD • Top • W/Z Grappa Portal or GRAT script User Kaushik De DoE/NSF Review
Monitoring Team • Dantong Yu, Patrick McGuigan, Craig Tull, Kaushik De, Shawn McKee, Dan Engh, Jason Smith • Monitoring is critically important in distributed Grid computing • check system health, debug problems • discover resources using static data • job scheduling and resource allocation decisions using dynamic data from MDS and other monitors • Testbed monitoring priorities • Discover site configuration • Discover software installation • Application monitoring • Grid status/operations monitoring • Also need • Well defined data for job scheduling • Visualization Kaushik De DoE/NSF Review
Monitoring - Back End • Publishing MDS information • Glue schema - BNL & UTA • Pippy - Pacman information service provider • BNL ACAS schema • Hierarchical GIIS server • Non-MDS back ends • iPerf, Netlogger, Prophesy, Ganglia • Archiving • MySQL • GridView, BNL ACAS • RRD • Network • Work needed • What to store? • Replication of archived information • Good progress on back end! Kaushik De DoE/NSF Review
Monitoring - Front End • MDS based • GridView, Gridsearcher • Converting TeraGrid and other toolkits • Non-MDS • Cricket, Ganglia • Work needed • Urgent for SC2002! Graphs, maps, drill-down… • New visualization team: Dantong Yu (evaluation of existing tools), Patrick McGuigan (Java CoG, Python), Jason Smith (PHP) Kaushik De DoE/NSF Review
GridView 2.2 • Simple visualization tool using Globus Toolkit • First native Globus application for ATLAS grid (March 2001) • Collects information using Globus tools. Archival information is stored in MySQL server on a different machine. Data published through web server on a third machine. • http://heppc1.uta.edu/atlas/grid-status/index.html Kaushik De DoE/NSF Review
Testbed Tools • Many tools developed by the U.S. ATLAS testbed group during past year • GridView - simple tool to monitor status of testbed Kaushik De,Patrick McGuigan • Gripe - unified user accounts Rob Gardner • Magda - MAnager for Grid DAta Torre Wenaus, Wensheng Deng (see Gardner & Wenaus talks) • Pacman - package management and distribution tool Saul Youssef • Being widely used or adopted by iVDGL VDT, Ganga, and others (see Gardner talk) • Grappa - web portal using active notebook technology Shava Smallen (see Gardner talk) • GRAT - GRid Application Toolkit • Gridsearcher - MDS browser Jennifer Schopf • GridExpert - Knowledge Database Mark Sosebee • VO Toolkit - Site AA Rich Baker (see Baker talk) • ... Kaushik De DoE/NSF Review
Integration!! • Coordination with other grid efforts and software developers - very difficult task! • Project centric: • GriPhyN/iVDGL - Rob Gardner • PPDG - Torre Wenaus • EDG - Ed May, Jerry Gieraltowski • ATLAS/LHCb - Rich Baker • ATLAS/CMS - Kaushik De • ATLAS/D0 - Jae Yu • Fabric/Middleware centric: • Afs Software installations - Alex Undrus, Shane Canon, Iwona Sakrejda • Networking - Shawn McKee, Rob Gardner • Virtual and Real Data Management - Wendsheng Deng, Sasha Vaniachin, Pavel Nevski, David Malon, Rob Gardner, Dan Engh, Mike Wilde, Yong Zhao, Shava Smallen • Security/Site AA/VO - Rich Baker, Dantong Yu Kaushik De DoE/NSF Review
SC2002 Plans • SC2002 in Maryland, mid-November • Testbed Production demo (BNL) Kaushik De • Monitor/interact with grid production • ATLAS/CMS demo (FNAL/SLAC) Kaushik De • preliminary discussions with CMS • may become iVDGL demo (see Gardner talk) • ATLAS GRAT already running at CMS sites • GridView is monitoring two CMS sites • Application monitoring (LBNL) Craig Tull • Athena + Netlogger + Prophesy • Virtual data demo (ANL/UC/IU) Rob Gardner • Common areas • Brochure - Rob Gardner • Posters - Craig Tull • Common script - Jennifer Schopf Kaushik De DoE/NSF Review
Testbed Production Demo. (in BNL booth) • ATLAS physics story • ATLAS computing story • Visualize production: • Monitor site status • static - glue, pippy • dynamic - jobs, cpu usage • Monitor data status • magda - visual? • VDC (same as IU booth) • Monitor applications • Athena instrumented (same as LBNL booth) • Event display? • First version at LBNL US Computing meeting July 29-31 Kaushik De DoE/NSF Review
ATLAS-CMS Demo. Architecture SC2002 Demo Visualization (status, physics) ATLAS-CMS User Job Globus, Condor-G? MDS, Ganglia, Paw/Root Scheduling Policy ?? Condor, Python? Production Jobs ATLAS-CMS Testbed MOP, GRAT, Grappa Kaushik De DoE/NSF Review
Summary • Testbed -> SC2002 • Recently refocused testbed activities and plans • Important grid-based production milestone this summer to test middleware using light-weight layered approach to software deployment • Testbed production should naturally lead to Supercomputing 2002 demos • Exploring various integration and cooperation issues - no need to reinvent the wheel • The testbed can provide a lot of resources, hardware and people, when fully grid-enabled • In summary - hardware not limiting problem yet! Middleware coming along. Need serious work on integration and deployment and testing. Shortage of people critical here - lab and university base funding shortages are the limiting factors!! Kaushik De DoE/NSF Review