150 likes | 246 Views
BaBar and D Ø Experiment Reports. DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0 liaison to PPDG. BaBar Introduction D Ø. D Ø’ s PPDG effort concentrating on: Data Distribution on the Grid (SAM).
E N D
BaBar and DØ Experiment Reports DOE Review of PPDG January 28-29, 2003 Lee Lueking Fermilab Computing Division D0 liaison to PPDG Lee Lueking, PPDG Review
BaBar Introduction DØ • DØ’s PPDG effort concentrating on: • Data Distribution on the Grid (SAM). • Job submission on the Grid (JIM w/Condor-G and Globus). • People involved: • Igor Terekhov (FNAL; JIM Team Lead) • Gabriele Garzoglio (FNAL) • Andrew Baranovski (FNAL) • Parag Mhashilkar & Vijay Murthi (via Contr. w/ UTA CSE) • Lee Lueking (FNAL; D0 Liaison to PPDG) • Interactions with other Grid efforts that are part of D0: • GridPP (UK), GridKA (DE), NIKHEF (NL), CCIN2P3 (FR) • Very closely working with the Condor team to achieve • Grid Job & Resource Matchmaking service • Other robustness and usability features • BaBar's PPDG effort concentrating on: • Data Distribution on the Grid (SRB, Bdbserver++). • Job submission on the Grid (EDG,LCG). • People involved: • Tim Adye (RAL) • Andy Hanushevsky (SLAC) • Adil Hasan (SLAC) • Wilko Kroeger (SLAC). • Interactions with other Grid efforts that are part of BaBar: • GridPP (UK), EDG (Europe through Dominique Boutigny), GridKA, Italian Grid groups etc. • BaBar Grid applications are being designed to be data-format neutral • BaBar's new computing model should have little impact on the apps. Lee Lueking, PPDG Review
Tier A Centers Monte Carlo Regional Center Analysis site Overview of BaBar and DØ Data Handling • Both experiments have extensive distributed computing and data handling systems • Significant amounts of data are processed at remote sites in the US and Europe BaBar Analysis Jobs (SLAC) Apr'02 to Mar'03 DØ Integrated Data Consumed Mar’02 to Mar‘03 BaBar Deployment 140k Jobs 1.2 PB DØ Integrated Files Consumed Mar’02 to Mar‘03 BaBar Database Growth (TB) Jan'02 to Dec'02 DØ SAM Deployment 730 TB 4.0 M Files Mar2002 Mar2003 Lee Lueking, PPDG Review
BaBar Bulk Data Distribution – SRB • Storage Resource Broker (SRB) from SDSC being used to test out data distribution from Tier A to Tier A with view to production this summer. • So far have had 2 successful demos at Super Computing 2001 (SLAC->SLAC), 2002 (SLAC->ccin2p3). • Have been testing SRB V2 (released Feb 2003), new features Bulk registering in RDBMS, parallel stream file replication. • Busy incorperating newly designed BaBar metadata tables to SRB's RDBMS tables. Looking to improve file replication performance (playing with streams, etc). Lee Lueking, PPDG Review
BaBar User-driven data distribution: BdbServer++ • Attempts to address use-case: user wants to copy a collection of sparse events with little space overhead (mainly Tier A to Tier C). • BdbServer++ essentially a set of scripts that: • Submit a job to the Grid to make a deep-copy of the sparse collection (ie copy objects for events of interest only). • Then copy the files back to user's institution through Grid (can use globus-url-copy). • Poster at CHEP2003 • Currently have tested Deep-copy through the grid using EDG and pure Globus. Just completed test of extracting data using globus-url-copy (pure Globus request). • To do: incorperate with BaBar bookeeping. Robustness, reliability tests, production-level scripts for submission, copying. Lee Lueking, PPDG Review
BaBar Job Submission on the Grid • Many production-like activities could take advantage of using compute resources at more than one site. • Analysis Production: ccin2p3 (France), UK, SLAC – using EDG installations. • Simulation Production: Ferrara (Italy) Grid Group, Ohio – using EDG and VDT installations. • Also very useful for data distribution (BdbServer++), ccin2p3 (France), SLAC. Proposed BaBar Grid Architecture Lee Lueking, PPDG Review
BaBar Job Submission on the Grid • There was a CHEP 2003 talk and Poster, a grid demo set up in UK (run BaBar jobs on UK grid) and have managed to run Simulation Production and data distribution tests on Grid. • Plan: test new EDG2/LCG installations, increase users as releases stabilize. • BbgUtils.pl – perl script to allow easier client-side installation of Globus + CA's (currently works for Sun, Linux). • Script copies all tar files and signing-policies etc necessary for client installation for that expt. • Can be readily extended to include SRB client-side installation, EDG/LCG client side installation, etc. Lee Lueking, PPDG Review
DØ Objectives of SAMGrid • Bring standard grid technologies (including Globus and Condor) to the Run II experiments. • Enable globally distributed computing for DØ and CDF. • JIM (Job and Information Management) complements SAM by adding job management and monitoring to data handling. • Together, JIM + SAM = SAMGrid Lee Lueking, PPDG Review
JIM Job Management User Interface User Interface Submission Client Submission Client Match Making Service Match Making Service Broker Queuing System Queuing System Information Collector Information Collector JOB Data Handling System Data Handling System Data Handling System Data Handling System Execution Site #1 Execution Site #n Computing Element Computing Element Computing Element Storage Element Storage Element Storage Element Storage Element Storage Element Grid Sensors Grid Sensors Grid Sensors Grid Sensors Computing Element Lee Lueking, PPDG Review
DØ JIM Deployment • A site can join SAM-Grid with combos of services: • Monitoring, and/or • Execution, and/or • Submission • May 2003: Expect 5 initial execution sites for SAMGrid deployment, and 20 submission sites. • GrkdKa (Karlsruhe) – Analysis site • Imperial College and Lancaster – MC sites • U. Michigan (NPACI) – Reconstruction center. • FNAL - CLueD0 as a submission site. • Summer 2003: Continue to add execution and submission sites. Second round of execution site deployments include Lyon (ccin2p3), Manchester, MSU, Princeton, UTA, FNAL – CAB system. • Hope to grow to dozens execution and hundreds of submission sites over next year(s). • Use grid middleware for job submission within a site too! • Administrators will have general ways of managing resources. • Users will use common tools for submitting and monitoring jobs everywhere. Lee Lueking, PPDG Review
What’s Next for SAMGrid?After JIM version 1 • Improve scheduling jobs and decision making. • Improved monitoring, more comprehensive, easier to navigate. • Execution of structured jobs • Simplifying packaging and deployment. Extend the configuration and advertising features of the uniform framework built for JIM that employs XML. • CDF is adopting SAM and SAMGrid for their Data Handling and Job Submission. CDF also has asked to join PPDG. • Interoperability, interoperability, interoperability • Working with EDG and LCG to move in common directions • Moving to Web services, Globus V3, and all the good things OGSA will provide. In particular, interoperability by expressing SAM and JIM as a collection of services, and mixing and matching with other Grids Lee Lueking, PPDG Review
Username clashing issues, moving to GSI and Grid Certificates • Interoperability with many MSS. • Security issues, firewalls, site policies • Robust job submission on the grid • File replication integrity • Preemptive distributed caching • Private networks • Routing data in a worldwide system. • Reliable network file transfers, timeouts, and retries • Simplifying complex installation procedures Challenges • Meeting the challenges of real data handling and job submission BaBar and DØ have confronted real-life issues, including… • Troubleshooting is an important and time consuming activity in distributed computing environments, and many tools are needed to do this effectively. • Operating these distributed systems on a 24/7 basis involves coordination, training, and worldwide effort. • Standard middleware is still hard to use, and requires significant integration, testing, and debugging. Lee Lueking, PPDG Review
PPDG Benefits to BaBar and DØ • PPDG has provided very useful collaboration with, and feedback to, other Grid and Computer Science Groups. • Development of tools and middleware that should be of general interest to the Grid community, e.g. • BbgUtils.pl • Condor-G enhancements • Deploying and testing grid middleware under battlefield conditions of operational experiments hardens the software and helps CS learn what is needed. • The CS groups enable the experiments to examine problems in new, innovative ways, and provide important new technologies for solving them. Lee Lueking, PPDG Review
The End Lee Lueking, PPDG Review