1 / 13

BaBar Status Report

BaBar Status Report. Chris Brew GridPP16 QMUL 28/06/2006. Outline. 3 BaBar Grid Projects: Monte Carlo (Simulation) Production Skimming User Analysis easyGrid bbrbsub Overall experience with the Grid Conclusion. Usual Guff.

Download Presentation

BaBar Status Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BaBar Status Report Chris Brew GridPP16 QMUL 28/06/2006

  2. Outline • 3 BaBar Grid Projects: • Monte Carlo (Simulation) Production • Skimming • User Analysis • easyGrid • bbrbsub • Overall experience with the Grid • Conclusion

  3. Usual Guff • BaBar is a running experiment, Situated at SLAC near San Francisco • e+e- collider tuned to investigate CP Violation in B Physics • Started taking data in 1999/2000 currently has 350 fb-1 of data • Projected to have 1000 fb-1 by end of 2008

  4. Tier 2s Large Tier 2s Data Flow Simulation Production Merging Tier 1 (RAL) Tier 0 (SLAC) Tier 1 (RAL) Analysis Skimming

  5. Simulation Production • Running at M/Cr, RAL, RALPP and B'ham • Tests at Lancs, Oxford + others • Still working to add other BaBar Sites • Limited by need to install Objy DB at each site • Stable running: 500,000,000 Events Produced, 12% of worldwide total. • New R-GMA Based job monitor: Status query down from 45 minutes to 5 minutes • Recent hiatus due to bugs found in BaBar simulation code which caused a global halt. Production has recently restarted C. Brew, G.Castelli

  6. Skimming • New Grid Project: Process real and simulated data to select ~200 subsamples, defined by the BaBar physics analysis working groups. • Much quicker to run over skim than full data sample • Skimming includes physics analysis code and saves the results, so CPU time spent in skimming is regained many times over • Plan is to run at one or more large T2s. If we can get this into production we should be able to recover some of the UK’s Common Fund rebate we’ve lost due to lack of T1 Resources • GridPP has funded three months of effort from Will Roethel to further this work G.Castelli, W. Roethel, C. Brew

  7. Status of Skimming

  8. User Analysis (easyGrid) • Prototype running on Manchester Testbed testbed (80 CPUs) since Nov/2005 without problems. Real analysis with real data by real users that knows nothing about grid. • No errors in Easygrid job submission. • No errors in grid testbed due to installation configuration and improvements. J. Werner

  9. Many problems encountered moving from Testbed to Production Grid Resources • errors in RB, CE, etc - 10% of time with less then 4 jobs/second submission rate. • errors in BDII, SE, dcache. SE fails 40% of jobs (less then 100 jobs in parallel). • when SE works, performance is terrible (approx. 8 times more time to run same software). • lack of response to problems from site admins. • Serious issue for a typical user analysis which is about 2000 8 CPU hour jobs • Product development will be resumed when resources are available and reliable. Meanwhile, EasyGrid prototype and M/Cr testbed will attend users For more information: http://www.hep.man.ac.uk/u/jamwer/

  10. User Analysis (bbrbsub) • Integration of Simple Job Manager + bbrbsub with Grid Submission • Take the tools already used by analysis users to submit jobs at RAL • Transparently add RAL -> RAL grid submission • Add RAL -> M/Cr and M/Cr -> RAL submission capabilities • Add RAL -> RALPP and M/Cr -> RALPP • Gradually build up full grid functionality • Application transport and configuration • Automatic output recovery • Job to data matching G. Castelli

  11. Overall Grid Experience • Grid is still not reliable (worst test run): • SP running seems to indicate that Grid isn't getting more reliable and may be getting less so, long term efficiency stuck around 80%: • RB Problems (have capability of multiple RB use but efficiency drops because of lack of fail over) • Central LFC problems • BDII problems - Sites drop in and out of bdii • SE Problems - Files randomly don't up/download • Could run for 1-2 weeks at a time with minimal intervention, now seems to need daily (or more) interventions

  12. Conclusions • BaBar has made good progress on moving its three main offline compute intensive processes to the Grid • Monte-Carlo generation is in production, significant progress has been made in skimming and user analysis • There are many things we like about the grid • We are adapting the BaBar software framework to integrate better with the grid, the dependence on Objectivity will be removed and we are adding the ability to read data directly from Storage Elements • However, reliability and ease of use are still big issues

More Related