220 likes | 389 Views
ATLAS Data Challenges. LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC. Outlook. ATLAS Data Challenges Some considerations. ATLAS Data challenges. Goal understand and validate: our computing model, our data model and our software our technology choices
E N D
ATLAS Data Challenges LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC
Outlook • ATLAS Data Challenges • Some considerations G. Poulard LCG-PEB 12 December 2001
ATLAS Data challenges • Goal • understand and validate: • our computing model, our data model and our software • our technology choices • How? • In iterating on a set of DCs of increasing complexity • Ideally: Start with data which looks like real data • Run the filtering and reconstruction chain • Store the output data into our database • Run the analysis • Produce physics results • To study performances issues, database technologies, analysis scenarios, ... • To identify weaknesses, bottle necks, etc… (but also good points) • But we need to produce the ‘data’ and satisfy ‘some’ communities • Simulation will be part of DC0 & DC1 • Data needed by HLT community G. Poulard LCG-PEB 12 December 2001
ATLS Data Challenges: DC0 • Three ‘original’ paths involving databases: • GeneratorGeant3(ZebraObjy)Athena reconstructionsimple analysis • This is the “primary” chain (100,000 events) • Purpose: this is the principal continuity test • Atlfast chain: GeneratorAtlfastsimple analysis • Demonstrated for Lund, but (transient) software is changing • Purpose: continuity test • Physics TDR(ZebraObjy)Athena reconstructionsimple analysis • Purpose: Objy test? • Additional path: • GeneratorGeant4(Objy) • Purpose: Robustness test (100,000 events) G. Poulard LCG-PEB 12 December 2001
ATLAS Data Challenges: DC0 • Originally: November-December 2001 • 'continuity' test through the software chain • aim is primarily to check the state of readiness for DC1 • We plan ~100k Z+jet events, or similar • software works: • issues to be checked include • G3 simulation running with the ‘latest’ version of the geometry • reconstruction running. • data must be written/read to/from the database • Now: • Before Xmas • ~30k events (full simulation) + ~30k events (conversion) • G4 robustness test (~100k events) • Early January • Repeat the exercise with a new release (full chain) • DC0 : End January • Statistics to be defined (~100k events) G. Poulard LCG-PEB 12 December 2001
ATLAS Data Challenges: DC1 • DC1 February-July 2002 • reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottle necks … • use of GRID as and when possible and appropriate • data management • Use (evaluate) more than one database technology (Objectivity and ROOT I/O) • Relative importance under discussion • Learn about distributed analysis • should involve CERN & outside-CERN sites • site planning is going on, an incomplete list already includes sites from Canada, France, Italy, Japan, UK, US, Russia • scale 107 events in 10-20 days, O(1000) PC’s • data needed by HLT & Physics groups (others?) • simulation & pile-up will play an important role • shortcuts may be needed (especially for HLT)! • checking of Geant4 versus Geant3 G. Poulard LCG-PEB 12 December 2001
ATLAS Data Challenges: DC1 • DC1 will have two distinct phases • First, production of events for HLT TDR, where the primary concern is delivery of events to HLT community; • Second, testing of software (G4, dBases, detector description,etc.) with delivery of events for physics studies • Software will change between these two phases • Simulation & pile-up will be of great importance • strategy to be defined (I/O rate, number of “event” servers?) • As we want to do it ‘world-wide’ we will ‘port’ our software to the GRID environment and use as much as possible the GRID middleware (ATLAS kit to be prepared) G. Poulard LCG-PEB 12 December 2001
ATLAS Data Challenges: DC2 • DC2 Spring-Autumn 2003 • Scope will depend on what has and has not been achieved in DC0 & DC1 • At this stage the goal includes: • Use of ‘TestBed’ which will be built in the context of the Phase 1 of the “LHC Computing Grid Project” • Scale at a sample of 108 events • System at a complexity X% of 2006-2007 system • Extensive use of the GRID middleware • Geant4 should play a major role • Physics samples could(should) have ‘hidden’ new physics • Calibration and alignment procedures should be tested • May be to be synchronized with “Grid” developments G. Poulard LCG-PEB 12 December 2001
DC scenario • Production Chain: • Event generation • Detector Simulation • Pile-up • Detectors responses • Reconstruction • Analysis These steps should be as independent as possible G. Poulard LCG-PEB 12 December 2001
Production stream for DC0-1 “OO-db” is used for “OO database”, it could be Objectivity, ROOT/IO, … G. Poulard LCG-PEB 12 December 2001
Ntuple ATLFAST OO Obj.,Root Pythia, Isajet, Herwig HepMC Obj.,Root Comb. Ntuple GENZ ATHENA reconstruction Comb. Ntuple Obj., Root RD event ? OO-DB ? G3/DICE Missing: -- filter, trigger -- HepMC in Root -- ATLFAST output in Root (TObjects) -- Link MC truth - ATLFAST -- Reconstruction output in Obj., Root -- EDM (e.g. G3/DICE input to ATHENA) ZEBRA Phys. TDR data DC0 Ntuple like G. Poulard LCG-PEB 12 December 2001
DC1 Ntuple ATLFAST OO Obj.,Root Pythia, Isajet, Herwig, MyGeneratorModule Ntuple- like HepMC Obj.,Root Ntuple Comb. Ntuple GENZ ATHENA reconstruction Comb. Ntuple Obj., Root RD event ? OO-DB ? G3/DICE Obj. ZEBRA Missing: -- filter, trigger -- Detector description -- HepMC in Root -- Digitisation -- ATLFAST output in Root (TObjects) -- Pile-up -- Link MC truth - ATLFAST -- Reconstruction output in Obj., Root -- EDM (e.g. G3/DICE , G4 input to ATHENA) G4 G. Poulard LCG-PEB 12 December 2001
DC0 G4 Robustness Test • Test plan Two kinds of tests: • A ‘large-N’ generation with the ATLAS detector geometry • Detailed geometry for the muon system (input from AMDB) • A crude geometry for InnerDetector and Calorimeter • A ‘large-N’ generation with a test beam geometry • TileCal - Test beam for electromagnetic interactions • Physics processes • Higgs -> 4 muons (by Pythia) <---- Main target • Minimum bias event <---- if possible G. Poulard LCG-PEB 12 December 2001
DC0 G4 Robustness Test • Expected data size and CPU required(Only for ATLAS detector geometry) per event 1,000 events 4-vectors database ~ 50 KB ~ 50 MB Hits/Hit-collections ~ 1.5 MB ~ 1.5 GB database (See the note below for these numbers) CPU ~ 60 sec ~ 17 hours (Pentium III, 800MHz) [Note] Not the final number. It includes a safety factor to reserve extra disk space. • Required resources (Only for ATLAS detector geometry) • PC farm ~ 10 CPUs ( 5 machines with dual processors) • Disk space ~ 155 GB • Process period ~ 1 week G. Poulard LCG-PEB 12 December 2001
Data management • It is a key issue • Evaluation of more than one technology is part of DC1 • Infrastructure has to be put in place: • For Objectivity & ROOT I/O • Software, hardware, tools to manage the data • creation, replication, distribution, … • Tools are needed to run the production • “bookkeeping” , “cataloguing” , “job submission”… • We intend to use as much as possible GRID tools • Magda for DC0 G. Poulard LCG-PEB 12 December 2001
DC1-HLT - CPU Based on experience from Physics TDR G. Poulard LCG-PEB 12 December 2001
DC1-HLT - data G. Poulard LCG-PEB 12 December 2001
DC1-HLT data with pile-up • Inaddition to ‘simulated’ data, assuming ‘filtering’ after simulation (~14% of the events kept). • (1) keeping only ‘digits’ • (2) keeping ‘digits’ and ‘hits’ G. Poulard LCG-PEB 12 December 2001
Ramp-up scenario @ CERN Week in 2002 G. Poulard LCG-PEB 12 December 2001
Some considerations (1): • We consider that LCG is crucial for our success • We agree to have as much as possible common projects under the control of the project • We think that a high priority should be given on the development of the shared Tier0 & shared Tier1 centers • We are interested in “cross-grid” projects • Obviously to avoid duplication of work • We consider as very important the interoperability between US and EU Grid (Magda as a first use case) G. Poulard LCG-PEB 12 December 2001
Some considerations (2): • We would like to set up a really distributed production system (simulation, reconstruction, analysis) making use, already for DC1, of the GRID tools (especially those of EU-DataGrid Release 1) • The organization of the operation of the infrastructure should be defined and put in place • We need a ‘stable’ environment during the data challenges and a clear picture of the available resources as soon as possible G. Poulard LCG-PEB 12 December 2001
Some considerations (3): • We consider that the discussion on the common persistence technology should start as soon as possible under the umbrella of the project • We think that other common items (eg. dictionary languages, release tools, etc) are worthwhile (not with the same priority) but we must ask what is desirable and what is necessary • We think that the plan for the simulation should be understood G. Poulard LCG-PEB 12 December 2001