1 / 22

ATLAS Data Challenges

ATLAS Data Challenges. LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC. Outlook. ATLAS Data Challenges Some considerations. ATLAS Data challenges. Goal understand and validate: our computing model, our data model and our software our technology choices

fynn
Download Presentation

ATLAS Data Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Data Challenges LCG - PEB meeting CERN December 12th 2001 Gilbert Poulard CERN EP-ATC

  2. Outlook • ATLAS Data Challenges • Some considerations G. Poulard LCG-PEB 12 December 2001

  3. ATLAS Data challenges • Goal • understand and validate: • our computing model, our data model and our software • our technology choices • How? • In iterating on a set of DCs of increasing complexity • Ideally: Start with data which looks like real data • Run the filtering and reconstruction chain • Store the output data into our database • Run the analysis • Produce physics results • To study performances issues, database technologies, analysis scenarios, ... • To identify weaknesses, bottle necks, etc… (but also good points) • But we need to produce the ‘data’ and satisfy ‘some’ communities • Simulation will be part of DC0 & DC1 • Data needed by HLT community G. Poulard LCG-PEB 12 December 2001

  4. ATLS Data Challenges: DC0 • Three ‘original’ paths involving databases: • GeneratorGeant3(ZebraObjy)Athena reconstructionsimple analysis • This is the “primary” chain (100,000 events) • Purpose: this is the principal continuity test • Atlfast chain: GeneratorAtlfastsimple analysis • Demonstrated for Lund, but (transient) software is changing • Purpose: continuity test • Physics TDR(ZebraObjy)Athena reconstructionsimple analysis • Purpose: Objy test? • Additional path: • GeneratorGeant4(Objy) • Purpose: Robustness test (100,000 events) G. Poulard LCG-PEB 12 December 2001

  5. ATLAS Data Challenges: DC0 • Originally: November-December 2001 • 'continuity' test through the software chain • aim is primarily to check the state of readiness for DC1 • We plan ~100k Z+jet events, or similar • software works: • issues to be checked include • G3 simulation running with the ‘latest’ version of the geometry • reconstruction running. • data must be written/read to/from the database • Now: • Before Xmas • ~30k events (full simulation) + ~30k events (conversion) • G4 robustness test (~100k events) • Early January • Repeat the exercise with a new release (full chain) • DC0 : End January • Statistics to be defined (~100k events) G. Poulard LCG-PEB 12 December 2001

  6. ATLAS Data Challenges: DC1 • DC1 February-July 2002 • reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottle necks … • use of GRID as and when possible and appropriate • data management • Use (evaluate) more than one database technology (Objectivity and ROOT I/O) • Relative importance under discussion • Learn about distributed analysis • should involve CERN & outside-CERN sites • site planning is going on, an incomplete list already includes sites from Canada, France, Italy, Japan, UK, US, Russia • scale 107 events in 10-20 days, O(1000) PC’s • data needed by HLT & Physics groups (others?) • simulation & pile-up will play an important role • shortcuts may be needed (especially for HLT)! • checking of Geant4 versus Geant3 G. Poulard LCG-PEB 12 December 2001

  7. ATLAS Data Challenges: DC1 • DC1 will have two distinct phases • First, production of events for HLT TDR, where the primary concern is delivery of events to HLT community; • Second, testing of software (G4, dBases, detector description,etc.) with delivery of events for physics studies • Software will change between these two phases • Simulation & pile-up will be of great importance • strategy to be defined (I/O rate, number of “event” servers?) • As we want to do it ‘world-wide’ we will ‘port’ our software to the GRID environment and use as much as possible the GRID middleware (ATLAS kit to be prepared) G. Poulard LCG-PEB 12 December 2001

  8. ATLAS Data Challenges: DC2 • DC2 Spring-Autumn 2003 • Scope will depend on what has and has not been achieved in DC0 & DC1 • At this stage the goal includes: • Use of ‘TestBed’ which will be built in the context of the Phase 1 of the “LHC Computing Grid Project” • Scale at a sample of 108 events • System at a complexity X% of 2006-2007 system • Extensive use of the GRID middleware • Geant4 should play a major role • Physics samples could(should) have ‘hidden’ new physics • Calibration and alignment procedures should be tested • May be to be synchronized with “Grid” developments G. Poulard LCG-PEB 12 December 2001

  9. DC scenario • Production Chain: • Event generation • Detector Simulation • Pile-up • Detectors responses • Reconstruction • Analysis These steps should be as independent as possible G. Poulard LCG-PEB 12 December 2001

  10. Production stream for DC0-1 “OO-db” is used for “OO database”, it could be Objectivity, ROOT/IO, … G. Poulard LCG-PEB 12 December 2001

  11. Ntuple ATLFAST OO Obj.,Root Pythia, Isajet, Herwig HepMC Obj.,Root Comb. Ntuple GENZ ATHENA reconstruction Comb. Ntuple Obj., Root RD event ? OO-DB ? G3/DICE Missing: -- filter, trigger -- HepMC in Root -- ATLFAST output in Root (TObjects) -- Link MC truth - ATLFAST -- Reconstruction output in Obj., Root -- EDM (e.g. G3/DICE input to ATHENA) ZEBRA Phys. TDR data DC0 Ntuple like G. Poulard LCG-PEB 12 December 2001

  12. DC1 Ntuple ATLFAST OO Obj.,Root Pythia, Isajet, Herwig, MyGeneratorModule Ntuple- like HepMC Obj.,Root Ntuple Comb. Ntuple GENZ ATHENA reconstruction Comb. Ntuple Obj., Root RD event ? OO-DB ? G3/DICE Obj. ZEBRA Missing: -- filter, trigger -- Detector description -- HepMC in Root -- Digitisation -- ATLFAST output in Root (TObjects) -- Pile-up -- Link MC truth - ATLFAST -- Reconstruction output in Obj., Root -- EDM (e.g. G3/DICE , G4 input to ATHENA) G4 G. Poulard LCG-PEB 12 December 2001

  13. DC0 G4 Robustness Test • Test plan Two kinds of tests: • A ‘large-N’ generation with the ATLAS detector geometry • Detailed geometry for the muon system (input from AMDB) • A crude geometry for InnerDetector and Calorimeter • A ‘large-N’ generation with a test beam geometry • TileCal - Test beam for electromagnetic interactions • Physics processes • Higgs -> 4 muons (by Pythia) <---- Main target • Minimum bias event <---- if possible G. Poulard LCG-PEB 12 December 2001

  14. DC0 G4 Robustness Test • Expected data size and CPU required(Only for ATLAS detector geometry) per event 1,000 events 4-vectors database ~ 50 KB ~ 50 MB Hits/Hit-collections ~ 1.5 MB ~ 1.5 GB database (See the note below for these numbers) CPU ~ 60 sec ~ 17 hours (Pentium III, 800MHz) [Note] Not the final number. It includes a safety factor to reserve extra disk space. • Required resources (Only for ATLAS detector geometry) • PC farm ~ 10 CPUs ( 5 machines with dual processors) • Disk space ~ 155 GB • Process period ~ 1 week G. Poulard LCG-PEB 12 December 2001

  15. Data management • It is a key issue • Evaluation of more than one technology is part of DC1 • Infrastructure has to be put in place: • For Objectivity & ROOT I/O • Software, hardware, tools to manage the data • creation, replication, distribution, … • Tools are needed to run the production • “bookkeeping” , “cataloguing” , “job submission”… • We intend to use as much as possible GRID tools • Magda for DC0 G. Poulard LCG-PEB 12 December 2001

  16. DC1-HLT - CPU Based on experience from Physics TDR G. Poulard LCG-PEB 12 December 2001

  17. DC1-HLT - data G. Poulard LCG-PEB 12 December 2001

  18. DC1-HLT data with pile-up • Inaddition to ‘simulated’ data, assuming ‘filtering’ after simulation (~14% of the events kept). • (1) keeping only ‘digits’ • (2) keeping ‘digits’ and ‘hits’ G. Poulard LCG-PEB 12 December 2001

  19. Ramp-up scenario @ CERN Week in 2002 G. Poulard LCG-PEB 12 December 2001

  20. Some considerations (1): • We consider that LCG is crucial for our success • We agree to have as much as possible common projects under the control of the project • We think that a high priority should be given on the development of the shared Tier0 & shared Tier1 centers • We are interested in “cross-grid” projects • Obviously to avoid duplication of work • We consider as very important the interoperability between US and EU Grid (Magda as a first use case) G. Poulard LCG-PEB 12 December 2001

  21. Some considerations (2): • We would like to set up a really distributed production system (simulation, reconstruction, analysis) making use, already for DC1, of the GRID tools (especially those of EU-DataGrid Release 1) • The organization of the operation of the infrastructure should be defined and put in place • We need a ‘stable’ environment during the data challenges and a clear picture of the available resources as soon as possible G. Poulard LCG-PEB 12 December 2001

  22. Some considerations (3): • We consider that the discussion on the common persistence technology should start as soon as possible under the umbrella of the project • We think that other common items (eg. dictionary languages, release tools, etc) are worthwhile (not with the same priority) but we must ask what is desirable and what is necessary • We think that the plan for the simulation should be understood G. Poulard LCG-PEB 12 December 2001

More Related