260 likes | 415 Views
ATLAS Data Challenges. ATLAS Software Workshop CERN September 20th 2001 Gilbert Poulard CERN EP-ATC. From CERN Computing Review. CERN Computing Review (December 1999 - February 2001) Recommendations: organize the computing for the LHC era LHC Grid project
E N D
ATLAS Data Challenges ATLAS Software Workshop CERN September 20th 2001 Gilbert Poulard CERN EP-ATC
From CERN Computing Review CERN Computing Review (December 1999 - February 2001) Recommendations: • organize the computing for the LHC era • LHC Grid project • Phase 1: Development & prototyping (2001-2004) • Phase 2: Installation of the 1st production system (2005-2007) • Software & Computing Committee (SC2) • Proposal being submitted to the CERN council • Ask the experiments to validate their Computing model by iteratingon a set of Data Challenges of increasing complexity ATLAS Software Workshop - CERN - 20 September 2001
LHC Computing GRID project • Phase 1: • prototype construction • develop Grid middleware • acquire experience with high-speed wide-area network • develop model for distributed analysis • adapt LHC applications • deploy a prototype (CERN+Tier1+Tier2) • Software • complete the development of the 1st version of the physics application and enable them for the distributed grid model • develop & support common libraries, tools & frameworks • including simulation, analysis, data management, ... • in parallel LHC collaborations must develop and deploy the first version of their core software ATLAS Software Workshop - CERN - 20 September 2001
ATLAS Data challenges • Goal • understand and validate our computing model and our software • How? • Iterate on a set of DCs of increasing complexity • start with data which looks like real data • Run the filtering and reconstruction chain • Store the output data into our database • Run the analysis • Produce physics results • study • Performances issues, database technologies, analysis scenarios, ... • identify • weaknesses, bottle necks, etc… ATLAS Software Workshop - CERN - 20 September 2001
ATLAS Data challenges • But: • Today we don’t have ‘real data’ • Needs to produce ‘simulated data’ first • so: • Physics Event generation • Simulation • Pile-up • Detector response • Plus reconstruction and analysis will be part of the first Data Challenges • we need also to “satisfy” the ATLAS communities • HLT, Physics groups, ... ATLAS Software Workshop - CERN - 20 September 2001
ATLAS Data challenges • DC0 November-December 2001 • 'continuity' test through the software chain • aim primarily to check the state of readiness for DC1 • DC1 February-July 2002 • reconstruction & analysis on a large scale • learn about data model; I/O performances; identify bottle necks … • data management • should involve CERN & outside-CERN sites • scale 107 events in 10-20 days, O(1000) PC’s • data needed by HLT (others?) • simulation & pile-up will play an important role • checking of Geant4 versus Geant43 • DC2 January-September 2003 • use ‘prototype’, Grid middleware • increased complexity ATLAS Software Workshop - CERN - 20 September 2001
DC scenario • Production Chain: • Event generation • Simulation • Pile-up • Detectors responses • Reconstruction • Analysis ATLAS Software Workshop - CERN - 20 September 2001
Production stream ATLAS Software Workshop - CERN - 20 September 2001
Event generation • The type of events has to be defined • Several event generators will, probably be used • For each of them we have to define the version • in particular Pythia • should it be a special ATLAS one? (size of common block) • We have also to insure that it runs for large statistics • Both, events type & event generators have to be defined by • HLT group (for HLT events) • Physics community • Depending on the output we can use the following frameworks • ATGEN/GENZ • for ZEBRA output format • Athena • for output in OO-db (HepMC) • we can also think to use only one framework and ‘convert’ the output from one to the other one (OO-db to Zebra or Zebra to OO-db), depending on the choice. I don’t think this is realistic. ATLAS Software Workshop - CERN - 20 September 2001
Simulation • The goal is here to track the particles generated by the event generator to the detector. • We can use either Geant3 or Geant4 • for HLT & physics studies we still rely on Geant3 • I think that Geant4 should also be used • to get experience with ‘large production’ as part of its validation • it would be good to use the same geometry • ‘same geometry’ has to be defined • This is a question to the ‘simulation’ group • In the early stage we could decide to use only part of the detector • it would also be good to use the same sample of generated events • this has also to be defined by the ‘simulation’ group • for Geant3 simulation we will use either the “Slug/Dice” framework or the “Atlsim” framework • In both cases output will be Zebra (“Hits” and “deposited energy” for the calorimeters) • for Geant4 simulation I think that we will use the FADS/Goofy framework • output will be ‘Hits collections’ in OO-db ATLAS Software Workshop - CERN - 20 September 2001
Pile-up & digitization • We have few possible scenarios • Work in “Slug/Dice” or “Atlsim” framework • input is ZEBRA • output is ZEBRA • advantage: we have the full machinery in place • Work in “Athena” framework • 2 possibilities • 1) ‘mixt’ • input is hits from ZEBRA • ‘’digits’ and digits collections’ are produced • output is ‘digits collections’ in OO-db • 2) ‘pure’ Athena • input is ‘Hits collections’ from OO-db • ’digits’ and digits collections’ are produced • output is ‘Digits collections’ in OO-db • We have to evaluate the consequences of the choice ATLAS Software Workshop - CERN - 20 September 2001
Reconstruction • Reconstruction • we want to use the ‘new reconstruction’ code being run in Athena framework • Input should be from OO-db • Output in OO-db: • ESD (event summary data) • AOD (analysis object data) • TAG (event tag) • Atrecon could be a back-up possibility • To be decided ATLAS Software Workshop - CERN - 20 September 2001
Analysis • We are just starting to work on this but • Analysis tools evaluation should be part of the DC • It will be a good test of the Event Data Model • Performance issues should be evaluated • Analysis scenario • number of analysis group, number of physicists per group, number of people who want to access the data at the same time • is of ‘first’ importance to ‘design’ the analysis environment • to measure the response time • to identify the bottle necks • for that we need input from you ATLAS Software Workshop - CERN - 20 September 2001
Data management • Several ‘pieces’ of what I call ‘Infrastructure’ will have to be decided, prepared and put in place. Not only the software but also the hardware and the tools to manage the data. Among them: • Everything related to the OO-db (Objy or/and ORACLE) • Tools for creation, replication, distribution, ... • What do we do with ROOT I/O • Which fraction of the events will be done with ROOT I/O • We said that the evaluation of more than one technology is part of the DC • Few thousand of files will be produced and we will need a “bookkeeping” to keep track of what happened during the processing of the data and a “catalog” to be able to locate all pieces of information • Where is the “HepMC” data ? • Where is the corresponding “simulated” or AOD data ? • Which selection criteria have been applied with which selection parameters, etc ? • Correlation between different pieces of information? ATLAS Software Workshop - CERN - 20 September 2001
DC scenario • For DC0 (end of September ?) we will have to see what is in place and decide on the strategy to be adopted in terms of: • Software to be used • Dice geometry (which version ?) • Reconstruction adapted to this geometry • Database • Infrastructure • I hope that we will have in place ‘tools’ for: • Automatic job-submission • catalog and bookkeeping • allocation of “run numbers” and of “random numbers” (bookkeeping) • we have to check with people involved in ‘grid’ projects or other projects (projects are not in phase) • I believe that the ‘validation’ of the various components should start now ATLAS Software Workshop - CERN - 20 September 2001
DC scenario • For DC1, • On the basis of what we will learn from DC0 we will have to adapt our strategy • Simulation & pile-up will be of great importance • strategy to be defined (I/O rate, number of “event” servers?) • Since we say that we would like to do it ‘world-wide’ we will have to see what can be used from the GRID developments • We will have to ‘port’ our software to the GRID environment (we have already a kit based on 1.3.0 release) • Don’t forget that we have to provide data to our HLT colleagues and the schedule should take into account their needs ATLAS Software Workshop - CERN - 20 September 2001
DC1-HLT - CPU ATLAS Software Workshop - CERN - 20 September 2001
DC1-HLT - data ATLAS Software Workshop - CERN - 20 September 2001
DC1-HLT data with pile-up • Inaddition to ‘simulated’ data, assuming ‘filtering’ after simulation (~14% of the events kept). • (1) keeping only ‘digits’ • (2) keeping ‘digits’ and ‘hits’ ATLAS Software Workshop - CERN - 20 September 2001
DC scenario • For DC1, • On the hardware side we will have to insure that we have enough resources in terms of CPU, disk space, tapes, data servers … • We have started to do the evaluation of our needs but this should be checked • What will we do with the data generated during the DC? • Keep it on CASTOR (CERN mass storage system)? Tapes? • Outside institutes will use other systems (HPSS, …) • How will we exchange the data? • Do we want to have all the information at CERN?, everywhere? • What are the networking requirements? ATLAS Software Workshop - CERN - 20 September 2001
Ramp-up scenario ATLAS Software Workshop - CERN - 20 September 2001
What next • Prepare a first list of goals & requirements • with • HLT, Physics community • simulation, reconstruction, database communities • people working on ‘infrastructure’ activities • bookkeeping, cataloguing, ... • In order to • prepare a list of tasks • Some Physics oriented • But also like testing code, running production, … • set a list of work packages • define the priorities ATLAS Software Workshop - CERN - 20 September 2001
What next • In parallel • Start to build a task force • Volunteers? • Should come from the various activities • Start discussion with: • people involved in GRID projects and • responsible of Tier centers • Evaluate the necessary resources • @ CERN (COCOTIME exercise) • Outside CERN ATLAS Software Workshop - CERN - 20 September 2001
Then • Start the validation of the various components in the chain (putting dead lines for readiness) • Software • Simulation, pile-up, … • Infrastructure • Database, bookkeeping, … • Estimate what it will be realistic (!) to do • For DC0, DC1 • where (sharing of the work) • Insure that we have the resources • including manpower • “And turn the key” ATLAS Software Workshop - CERN - 20 September 2001
Expression of interests • So far, after the NCB meeting of July 10th: • Canada, France, Germany, Italy, Japan, Nederland, Nordic Grid, Poland, Russia, UK, US, … • Proposition to help in DC0 • Proposition to participate to DC1 • Contact with HLT community • needs input from other (physics) communities • Contact with Grid projects • EU-Data-GRID • Kit of ATLAS software • Other projects • contact with Tier centers • The question of the entry level to DC1 has been raised (O(100)?) ATLAS Software Workshop - CERN - 20 September 2001
Work packages • First (non exhaustive) list of work packages: • Event generation • Simulation • Geant3 • Geant4 • Pile-up • Detectors responses (Digitization) • “Zebra” – “OO-db” conversion • Event filtering • Reconstruction • Analysis • data management • Tools • Job submission & monitoring • Bookkeeping & cataloguing • Web interface ATLAS Software Workshop - CERN - 20 September 2001