Post-DC2/Rome Production

Post-DC2/Rome Production Kaushik De, Mark Sosebee University of Texas at Arlington U.S. Grid Phone Meeting July 13, 2005

Overview • Why restart managed production? • ~450 people attended Rome meeting, ~100 talks, many based on DC2/Rome data (though some still using DC1 or ATLFAST data). • Growing number of physicists looking at data every day. • Since Rome, many physics groups need data urgently and are starting to do private production since ‘grid is not available’. • SUSY needs background sample, top needs large dataset, Higgs… • Private production of common data samples is wasteful – many samples repeated, never used, mistakes… • Private production is not the correct model for 2008 – we do not have infinite computing resources, we will need quality control… • Grid needs to be available for ATLAS – just like the components of the detector, or core software – as a service to the collaboration. Kaushik De

Production Proposal • Plan is being developed jointly with physics coordination, software releases, and with prodsys development • Ian Hinchliffe & Davide Costanzo presented physics plan June 21st • KD is organizing distributed production for all ATLAS - presented this talk during phone meeting July 11th • General assumptions about grid: • Production needs to help with testing of new prodsys • Production must allow for shutdowns required to upgrade middleware (like OSG, ARC, LCG), services (like RLS, DQ, DB) • First proposal: • Restart low level production mid-July • July/August – validate release 10.0.4 (first on OSG, later NG/LCG) • September – test new prodsys • October – 1M event production to provide data for Physics Workshop Kaushik De

July 2005 Proposal • Finish up Rome pile-up sample on LCG • Archive/clean-up files on all grids, after checking with physics groups and making general announcement with 2 weeks notice • Archive and delete all DC1 and DC2 files? • Archive and delete Rome simul files? • Upgrade middleware/fabric • OSG – started end of June, ready by mid-July (Yuri Smirnov doing tests of U.S. sites already with Rome top sample) • ARC? LCG? • Prodsys/grid testing – started • Do 10.0.4 validation • Xin Zhao started installations on OSG • Pavel Nevski defining jobs Kaushik De

August 2005 Proposal • Start production of some urgent physics samples • Use DC2 software • Get list from physics groups • Validation of 10.0.x • Stage-in input files needed for September • Prodsys integration and scaling testsGrid infrastructure tests of new fabric – top sample • RTT set-up and testing (for nightly validation) • DDM testing • Complete deployment of 10.0.4/10.0.x on all grids Kaushik De

Production Samples • Physics groups will define 3 major samples • Sample A • for quick validation or debugging of software • scale 10^5 events, 15 datasets • Sample B • validation of major releases • scale 10^6 events, 25 datasets • Sample C • production sample • scale 10^7 events (same as DC2 or Rome), 50 datasets Kaushik De

September - Validation • Sep. 12-26, 2005 • Goal: test production readiness (validate prodsys) • Use sample A (throw-away sample) with Rome release 10.0.1 • Start RTT nightly tests with 10^4 events • Start grid & prodsys tests with 10^5 events • Steps: evgen, simul, digi and reco • Scale: 2000 jobs (50 events each) in 2 weeks (<1% of Rome rate) • Sep. 26-Oct. 3, 2005 • Goal: test production readiness (validate new release) • Use sample A with release 10.0.4, same steps • Grid, prodsys & DDM (pre-production) tests with 10^5 events • Scale: 2000 jobs in 1 week (~2% of Rome rate) Kaushik De

Computer Commissioning Proposal • Oct. 3-17, 2005 • Goal: production for Physics Workshop scheduled end of October • Prodsys – ready? Rod? • DDM – ready? Miguel? Contingency plan? • Use sample B with release 10.0.x • Steps: evgen, simul, digi, reco, tag (concatenation) • Scale: 10,000 jobs with 100 events each in 2 weeks (~10-20% of peak Rome rate, sample size is ~15% of total Rome sample) • Note: this is the end of line for release 10 Kaushik De

Release 11 CSC Proposal • October/November • RTT run every night to discover problems – Davide? • 1 week after any bug fix release, run Sample A on grid, scale 1000 jobs (100 events each), all steps if possible, typically in 2-5 days • 1 week after any stable release, run Sample B on grid, scale 10k jobs, all steps, typically in 1-2 weeks (this is still <10% of peak Rome production rate) • November/December • Goal: generate background sample for blind challenge, test prodsys • Use sample C with stable/tested release 11.0.x • Steps: evgen, simul, digi, reco, tag (concatenation) • Scale: 100k jobs with 100 events each (should exceed Rome rate, sample size approx. same as Rome) Kaushik De

2006 CSC Proposal • Early 2006 • Goal: blind challenge (physics algorithm tune-up, test analysis model) • Mix background data with blind samples (not done on grid to protect content – possibly run at a few Tier 2 sites) • Continue to run Samples A and B on grid for every release • Calibration/alignment test with Release 12 – require Sample C scale production (equivalent to Rome) Kaushik De

Appendix: Definition of Sample A • ~100k events, 10k per sample • Min Bias • Z->ee • Z->mumu • Z->tautau forced to large pt so that the missing et is large. • H->gamgam (130 GeV) • b-tagging sample 1 • b-tagging sample 2 • top • J1 • J4 • J8 Kaushik De

Appendix: Definition of Samples B, C • Sample B: • 1M events, at least 25k per sample • Includes all the sets from sample B, plus additional physics samples • Sample C: • 10M events, at least 100k per sample • Includes all the sets from sample A, 500k events each • Additional physics samples Kaushik De

Post-DC2/Rome Production