500 likes | 660 Views
ATLAS Data Challenges. NorduGrid Workshop Uppsala November 11-13; 2002 Gilbert Poulard ATLAS DC coordinator CERN EP-ATC. Outline. Introduction DC0 DC1 Grid activities in ATLAS DCn’s Summary DC web page: http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/DC/index.html. Data challenges.
E N D
ATLAS Data Challenges NorduGrid Workshop Uppsala November 11-13; 2002 Gilbert Poulard ATLAS DC coordinator CERN EP-ATC
Outline • Introduction • DC0 • DC1 • Grid activities in ATLAS • DCn’s • Summary DC web page: http://atlasinfo.cern.ch/Atlas/GROUPS/SOFTWARE/DC/index.html G. Poulard - NorduGrid Workshop
Data challenges • Why? • In the context of the CERN Computing Review it has been recognized that the Computing for LHC was very complex and requested a huge amount of resources. • Several recommendations were made, among them: • Create the LHC Computing Grid (LCG) project • Ask the experiments to launch a set of Data Challenges to understand and validate • Their computing model; data model; software suite • Their technology choices • The scalability of the chosen solutions G. Poulard - NorduGrid Workshop
ATLAS Data challenges • In ATLAS it was decided • To foresee a “serie” of DCs of increasing complexity • Start with data which looks like real data • Run the filtering and reconstruction chain • Store the output data into the ‘ad-hoc’ persistent repository • Run the analysis • Produce physics results • To study • Performance issues, persistency technologies, analysis scenarios, ... • To identify • weaknesses, bottle necks, etc… (but also good points) • In using both the hardware (prototype) the software and the middleware developed and/or deployed by the LCG project G. Poulard - NorduGrid Workshop
ATLAS Data challenges • But it was also acknowledged that: • Today we don’t have ‘real data’ • Need to produce ‘simulated data’ first • So: • Physics Event generation • Detector Simulation • Pile-up • Reconstruction and analysis • will be part of the first Data Challenges • we need also to “satisfy” the requirements of the ATLAS communities • HLT, Physics groups, ... G. Poulard - NorduGrid Workshop
ATLAS Data challenges • In addition it is understood that the results of the DCs should be used to • Prepare a computing MoU in due time • Perform a new Physics TDR ~one year before the real data taking • The retained schedule was to: • start with DC0 in late 2001 • Considered at that time as a preparation one • continue with one DC per year G. Poulard - NorduGrid Workshop
DC0 • Was defined as • A readiness and continuity test • Have the full chain running from the same release • A preparation for DC1; in particular • One of the main emphasis was to put in place the full infrastructure with Objectivity (which was the base-line technology for persistency at that time) • It should also be noted that there was a strong request from the physicists to be able to reconstruct and analyze the “old” physics TDR data within the new Athena framework G. Poulard - NorduGrid Workshop
DC0: Readiness & continuity tests (December 2001 – June 2002) • “3 lines” for “full” simulation • 1) Full chain with new geometry (as of January 2002) Generator->(Objy)->Geant3->(Zebra->Objy)->Athena rec.->(Objy)->Analysis • 2) Reconstruction of ‘Physics TDR’ data within Athena (Zebra->Objy)->Athena rec.-> (Objy) -> Simple analysis • 3) Geant4 • Robustness test Generator-> (Objy)->Geant4->(Objy) • “1 line” for “fast” simulation Generator-> (Objy) -> Atlfast -> (Objy) Continuity test:Everything from the same release for the full chain (3.0.2) G. Poulard - NorduGrid Workshop
Schematic View of Task Flow for DC0 Objectivity/DB Objectivity/DB Zebra H 4 mu AOD Hits/Digits MCTruth Hits/Digits MCTruth Atlsim/G3 Athena Athena HepMC Hits/Digits MCTruth AOD Hits/Digits MCTruth Pythia 6 HepMC Atlsim/G3 Athena Athena Hits/Digits MCTruth Hits/Digits MCTruth AOD HepMC Atlsim/G3 Athena Athena Data Conversion Event generation Detector Simulation Reconstruction G. Poulard - NorduGrid Workshop
DC0: Readiness & continuity tests (December 2001 – June 2002) • Took longer than foreseen • Due to several reasons • Introduction of new tools • Change of the base-line for persistency • Which has as a major consequence to divert some of the man power • Under-evaluation of the statement • “have everything from the same release” • Nevertheless we learnt a lot • Was completed in June 2002 G. Poulard - NorduGrid Workshop
ATLAS Data Challenges: DC1 • Original goals (November 2001): • reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottlenecks … • data management • Use/evaluate persistency technology (AthenaRoot I/O) • Learn about distributed analysis • Need to produce data for HLT & Physics groups • HLT TDR has been delayed to mid 2003 • Study performance of Athena and algorithms for use in HLT • High statistics needed • Scale: few samples of up to 107 events in 10-20 days, O(1000) PC’s • Simulation & pile-up will play an important role • Introduce new Event Data Model • Checking of Geant4 versus Geant3 • Involvement of CERN & outside-CERN sites: Worldwide excersise • use of GRID middleware as and when possible and appropriate • To cope with different sets of requirements and for technical reasons (including software development, access to resources) decided to split DC1 into two phases G. Poulard - NorduGrid Workshop
ATLAS DC1 • Phase I (April – August 2002) • Primary concern is delivery of events to HLT community • Put in place the MC event generation & detector simulation chain • Put in place the distributed MonteCarlo production • Phase II (October 2002 – January 2003) • Provide data with (and without) ‘pile-up’ for HLT studies • Introduction & testing of new Event Data Model (EDM) • Evaluation of new persistency technology • Use of Geant4 • Production of data for Physics and Computing Model studies • Testing of computing model & of distributed analysis using AOD • Use more widely GRID middleware G. Poulard - NorduGrid Workshop
DC1 preparation • First major issue was to get the software ready • New geometry (compared to December-DC0 geometry) • New persistency mechanism • … Validated • … Distributed • “ATLAS kit” (rpm) to distribute the software • And to put in place the production scripts and tools (monitoring, bookkeeping) • Standard scripts to run the production • AMI bookkeeping database ( Grenoble) • Magda replica-catalog (BNL) G. Poulard - NorduGrid Workshop
DC1 preparation: software (1) • New geometry (compared to December-DC0 geometry) • Inner Detector • Beam pipe • Pixels: Services; material updated; More information in hits; better digitization • SCT tilt angle reversed (to minimize clusters) • TRT barrel: modular design • Realistic field • Calorimeter • ACBB: material and readout updates • ENDE: dead material and readout updated (last minute update to be avoided if possible) • HEND: dead material updated • FWDC: detailed design • End-cap Calorimeters shifted by 4 cm. • Cryostats split into Barrel and End-cap • Muon • AMDB p.03 (more detailed chambers cutouts) • Muon shielding update G. Poulard - NorduGrid Workshop
ATLAS Geometry • Inner Detector • Calori meters • Muon System G. Poulard - NorduGrid Workshop
ATLAS/G3 Few Numbers at a Glance • 25,5 millions distinct volume copies • 23 thousands different volume objects • 4,673 different volume types • Few hundred pile-up events possible • About 1 million hits per event on average G. Poulard - NorduGrid Workshop
DC1 preparation: software (2) • New persistency mechanism • AthenaROOT/IO • Used for generated events • Readable by Atlfast and Atlsim • Simulation still using zebra G. Poulard - NorduGrid Workshop
DC1/Phase I preparation: kit; scripts & tools • Kit • “ATLAS kit” (rpm) to distribute the software • It installs release 3.2.1 (all binaries) without any need of AFS • It requires : • Linux OS (Redhat 6.2 or Redhat 7.2) • CERNLIB 2001 (from DataGrid repository) cern-0.0-2.i386.rpm (~289 MB) • It can be downloaded : • from a multi-release page (22 rpm's; global size ~ 250 MB ) • “tar” file also available • Scripts and tools (monitoring, bookkeeping) • Standard scripts to run the production • AMI bookkeeping database G. Poulard - NorduGrid Workshop
DC1/Phase I Task Flow • As an example, for 1 sample of di-jet events: • Event generation: 1.5 x 107 events in 150 partitions • Detector simulation: 3000 jobs Zebra Athena-Root I/O Di-jet Hits/ Digits MCTruth Atlsim/Geant3 + Filter HepMC 105 events (5000 evts) (~450 evts) Pythia 6 Hits/ Digits MCTruth Atlsim/Geant3 + Filter HepMC Hits/ Digits MCtruth Atlsim/Geant3 + Filter HepMC Event generation Detector Simulation G. Poulard - NorduGrid Workshop
DC1 preparation: validation & quality control • We defined two types of validation • Validation of the sites: • We processed the same data in the various centres and made the comparison • To insure that the same software was running in all production centres • We also checked the random number sequences • Validation of the simulation: • We used both “old” generated data & “new” data • Validation datasets: di-jets, single ,e, ,H4e/2/2e2/4 • About 107 evts reconstructed in June, July and August • We made the comparison between “old” and “new” simulated data G. Poulard - NorduGrid Workshop
DC1 preparation: validation & quality control • This was a very “intensive” activity • Many findings: simulation or software installation sites problems (all eventually solved) • We should increase the number of people involved • It is a “key issue” for the success! G. Poulard - NorduGrid Workshop
Example: jets distribution (di-jets sample) New sim sample Old sim sample 2 Comparison Reappearance of an old dice version in a site installed software G. Poulard - NorduGrid Workshop
Data Samples I • Validation samples (740k events) • single particles (e, g, m, p), jet scans, Higgs events • Single-particle production (30 million events) • single p (low pT; pT=1000 GeV with 2.8<h<3.2) • single m (pT=3, …, 100 GeV) • single e and g • different energies (E=5, 10, …, 200, 1000 GeV) • fixed h points; h scans (|h|<2.5); h crack scans (1.3<h<1.8) • standard beam spread (sz=5.6 cm); fixed vertex z-components (z=0, 4, 10 cm) • Minimum-bias production (1.5 million events) • different h regions (|h|<3, 5, 5.5, 7) G. Poulard - NorduGrid Workshop
Data Samples II • QCD di-jet production (5.2 million events) • different cuts onET(hard scattering) during generation • large production of ET>11, 17, 25, 55 GeV samples, applying particle-level filters • large production of ET>17, 35 GeV samples, without filtering, full simulation within |h|<5 • smaller production of ET>70, 140, 280, 560 GeV samples • Physics events requested by various HLT groups (e/g, Level-1, jet/ETmiss, B-physics, b-jet,m; 4.4 million events) • large samples for the b-jet trigger simulated with default (3 pixel layers) and staged (2 pixel layers) layouts • B-physics (PL) events taken from old TDR tapes G. Poulard - NorduGrid Workshop
Australia Austria Canada CERN Czech Republic Denmark France Germany Israel Italy Japan Norway Russia Spain Sweden Taiwan UK USA ATLAS DC1/Phase I: July-August 2002Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possibleWorldwide collaborative activityParticipation : 39 Institutes in 18 countries G. Poulard - NorduGrid Workshop
ATLAS DC1 Phase I : July-August 2002 • CPU Resources used : • Up to 3200 processors (5000 PIII/500 equivalent) • 110 kSI95 (~ 50% of one Regional Centre at LHC startup) • 71000 CPU*days • To simulate one di-jet event : 13 000 SI95sec • Data Volume: • 30 Tbytes • 35 000 files • Output size for one di-jet event (2.4 Mbytes) • Data kept at production site for further processing • Pile-up • Reconstruction • Analysis G. Poulard - NorduGrid Workshop
ATLAS DC1 Phase I : July-August 2002 3200 CPU‘s 110 kSI95 71000 CPU days 39 institutions in 18 countries 5*10*7 events generated 1*10*7 events simulated 3*10*7 single particles 30 Tbytes 35 000 files G. Poulard - NorduGrid Workshop
ATLAS DC1 Phase 1 : July-August 2002 G. Poulard - NorduGrid Workshop
ATLAS DC1 Phase II • Provide data with and without ‘pile-up’ for HLT studies • new data samples (huge amount of requests) • Pile-up in Atlsim • “Byte stream” format to be produced • Introduction & testing of new Event Data Model (EDM) • This should include new Detector Description • Evaluation of new persistency technology • Use of Geant4 • Production of data for Physics and Computing Model studies • Both ESD and AOD will be produced from Athena Reconstruction • We would like to get the ‘large scale reconstruction’ and the ‘data-flow’ studies ready but not be part of Phase II • Testing of computing model & of distributed analysis using AOD • Use more widely GRID middleware (have a test in November) G. Poulard - NorduGrid Workshop
Pile-up • First issue is to produce the pile-up data for HLT • We intend to do this now • Code is ready • Validation is in progress • No “obvious” problems G. Poulard - NorduGrid Workshop
Luminosity Effect Simulation • Aim Study Interesting Processing at different Luminosity L • Separate Simulation of Physics Events & Minimum Bias Events • Merging of • Primary Stream (Physics) • Background Stream (Pileup) Primary Stream Background Stream (KINE,HITS) (KINE,HITS N(L) 1 DIGITIZATION Bunch Crossing (DIGI) G. Poulard - NorduGrid Workshop
Pile-up features • Different detectors have different memory time requiring very different number of minimum bias events to be read in • Silicons, Tile calorimeter: t<25 ns • Straw tracker: t<~40-50 ns • Lar Calorimeters: 100-400 ns • Muon Drift Tubes: 600 ns • Still we want the pile-up events to be the same in different detectors ! G. Poulard - NorduGrid Workshop
Higgsinto twophotonsnopile-up G. Poulard - NorduGrid Workshop
Higgsinto twophotonsL=10^34pile-up G. Poulard - NorduGrid Workshop
Pile-up production • Scheduled for October-November 2002 • Both low (2 x 1033) and high luminosity (1034) data will be prepared • Resources estimate: • 10000 CPU days (NCU) • 70 Tbyte of data • 100000 files G. Poulard - NorduGrid Workshop
ATLAS DC1 Phase II (2) • Next steps will be to • run the reconstruction within Athena framework • Most functionality should be there with release 5.0.0 • Probably not ready for ‘massive’ production • Reconstruction ready by the end of November • produce the “byte-stream” data • perform the analysis of the AOD • In parallel the dedicated code for HLT studies is being prepared (PESA release 3.0.0) • Geant4 tests with a quite complete geometry should be available by mid-December • Large scale Grid test is scheduled for December • “Expected “end” date 31st January 2003 • “Massive” reconstruction is not part of DC1 Phase II G. Poulard - NorduGrid Workshop
ATLAS DC1 Phase II (3) • Compared to Phase I • More automated production • “Pro-active” use of the AMI bookkeeping database to prepare the jobs and possibly to monitor the production • “Pro-active” use of the “magda” replica catalog • We intend to run the “pile-up” production as much as possible where the data is • But we have already newcomers (countries and institutes) • We do not intend to send all the pile-up data to CERN • Scenari to access the data for reconstruction and analysis are being studied • Use of Grid tools is ‘seriously’ considered G. Poulard - NorduGrid Workshop
ATLAS DC1/Phase II: October 2002-January 2003Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possibleWorldwide collaborative activityParticipation : 43 Institutes • Australia • Austria • Canada • CERN • China • Czech Republic • Denmark • France • Germany • Greece • Israel • Italy • Japan • Norway • Russia • Spain • Sweden • Taiwan • UK • USA G. Poulard - NorduGrid Workshop
ATLAS Planning for Grid Activities • Advantages of using the Grid: • Possibility to do worldwide production in a perfectly coordinated way, using identical software, scripts and databases. • Possibility do distribute the workload adequately and automatically, without logging in explicitly to each remote system. • Possibility to execute tasks and move files over a distributed computing infrastructure by using one single personal certificate (no need to memorize dozens of passwords). • Where we are now: • Several Grid toolkits are on the market. • EDG – probably the most elaborated, but still in development. • This development goes way faster with the help of the users running real applications. G. Poulard - NorduGrid Workshop
Present Grid Activities • Atlas already used Grid test-beds in DC1/1 • 11 out of 39 sites ( ~5% of the total production) used Grid middleware: • NorduGrid (Bergen, Grendel, Ingvar, ISV, NBI, Oslo,Lund, LSCF) • all production done on the Grid • US Grid test-bed (Arlington, LBNL, Oklahoma; more sites will join in the next phase) • used for ~10% of US DC1 production (10% = 900 CPUdays) G. Poulard - NorduGrid Workshop
.... in addition • ATLAS-EDG task-force • with 40 members from ATLAS and EDG (led by Oxana Smirnova) • used the EU-DataGrid middleware to rerun 350 DC1 jobs in some Tier1 prototype sites: CERN, CNAF, Lyon, RAL, NIKHEF and Karlsruhe ( CrossGrid site) • done in the first half of September) • Good results have been achieved: • A team of hard-working people across the Europe • ATLAS software is packed into relocatable RPMs, distributed and validated • DC1 production script is “gridified”, submission script is produced • Jobs are run at a site chosen by the resource broker • Still work needed (in progress) for reaching sufficient stability and easiness of use • Atlas-EDG continuing till end 2002, interim report with recommendations is being drafted G. Poulard - NorduGrid Workshop
Grid in ATLAS DC1/1 US-ATLAS EDG Testbed Prod NorduGrid G. Poulard - NorduGrid Workshop
Plans for the near future • In preparation for the reconstruction phase (spring 2003) we foresee further Grid tests in November • Perform more extensive Grid tests. • Extend the EDG to more ATLAS sites, not only in Europe. • Test a basic implementation of a worldwide Grid. • Test the inter-operability between the different Grid flavors. • Inter-operation = submit a job in region A, the job is run in region B if the input data are in B; the produced data are stored; the job log is made available to the submitter. • The EU project DataTag has a Work Package devoted specifically to interoperation in collaboration with US IvDGL project: the results of the work of these projects is expected to be taken up by LCG (GLUE framework). G. Poulard - NorduGrid Workshop
Plans for the near future (continued) • ATLAS is collaborating with DataTag-IvDGL for interoperability demonstrations in November • How far we can go we will see during the next week(s) when we will discuss with technical experts. • The DC1 data will be reconstructed (using ATHENA) early 2003: the scope and way of using Grids for distributed reconstruction will depend on the results of the November/December tests. • ATLAS is fully committed to LCG and to its Grid middleware selection process • our “early tester” role has been recognized to be very useful for EDG. • We are confident that it will be the same for LCG. G. Poulard - NorduGrid Workshop
Long Term Planning • Worldwide Grid tests are essential to define in detail the ATLAS distributed Computing Model. • ATLAS members are already involved in various Grid activities and take also part in inter-operability tests. In the forthcoming DCs this will become an important issue. • All these tests will be done in close collaboration with the LCG and the different Grid projects. G. Poulard - NorduGrid Workshop
DC2-3-4-… • DC2: Q3/2003 – Q2/2004 • Goals • Full deployment of EDM & Detector Description • Geant4 replacing Geant3 (fully?) • Pile-up in Athena • Test the calibration and alignment procedures • Use LCG common software (POOL; …) • Use widely GRID middleware • Perform large scale physics analysis • Further tests of the computing model • Scale • As for DC1: ~ 107 fully simulated events • DC3: Q3/2004 – Q2/2005 • Goals to be defined; Scale: 5 x DC2 • DC4: Q3/2005 – Q2/2006 • Goals to be defined; Scale: 2 X DC3 G. Poulard - NorduGrid Workshop
Summary (1) • We learnt a lot from DC0 and the preparation of DC1 • The involvement of all people concerned is a success • The full production chain has been put in place • The validation phase was “intensive”, “stressing” but it is a “key issue” in the process • We have in hands the simulated events required for the HLT TDR • Use of Grid tools looks very promising G. Poulard - NorduGrid Workshop
Summary (2) • For DC1/Phase II: • Pile-up preparation is in good shape • The introduction of the new EDM is a challenge by itself • Release 5 (November 12) should provide the requested functionality • Grid tests are scheduled for November/December • Geant4 tests should be ready by mid-December G. Poulard - NorduGrid Workshop
Summary (3) • After DC1 • New Grid tests are foreseen in 2003 • ATLAS is fully committed to LCG • As soon as LCG-1 will be ready (June 2003) we intend to actively participate to the validation effort • Dates for next DCs should be aligned to the deployment of the LCG and Grid software and middleware G. Poulard - NorduGrid Workshop
Summary (4): thanks to all DC-team members A-WP1: Event generation A-WP3: Geant4 Simulation A-WP4: Pile-up A-WP2: Geant3 simulation A-WP5: Detector response A-WP7: Event filtering A-WP8: Reconstruction A-WP6: Data Conversion A-WP11: Tools A-WP9: Analysis A-WP10: Data Management A-WP12: Teams Production Validation …. A-WP14: Fast Simulation A-WP13: Tier Centres G. Poulard - NorduGrid Workshop