180 likes | 199 Views
Introduction and news. Welcome to our ninth meeting on CAT physics We are now about one year before data-taking How far are we from having infrastructure and tools ready for first data analysis in CAT? Analysis model and forums (beyond PAT workshop in Bergen)
E N D
Introduction and news • Welcome to our ninth meeting on CAT physics • We are now about one year before data-taking • How far are we from having infrastructure and tools ready for first data analysis in CAT? • Analysis model and forums (beyond PAT workshop in Bergen) AOD size and evolution, analysis model evolution Note analysis forums (monthly): first one on June 1st • Computing model for CAT: need Tier-3 at CERN! Explicit request to PH/IT in preparation: please give feedback! D. Froidevaux
Release 12 size • AOD/ESD size crisis… • Top Event Size: • ESD: 2.8 MB/event AOD:380KB/event • Average Size: • Rescale by 0.7 (top are bigger than average) • MC truth by 0.2 (20% simulation/real data) • ESD: 1.8 MB/event AOD: 190KB/event D. Costanzo, Bergen Something went wrong during release 12 bug fixing. What?? Did we lack an a-priori way to control changes? Should we have said “NO” to CSC clients? D. Froidevaux
ESD Size per event AOD Size per event Release 12 EDM content • Did our best with MC truth (remove interaction in fwd beam pile, first T/P prototype) • Truth includes atlfast, tracking truth, trigger truth, … • Trigger: unexpected size (~50% AOD!) • See next few slides from S. George • Jets, the Event Management Board approved 8 collections in ESD/AOD (was it a mistake?) • Inner Detector. Full PrepRawData in ESD, needs work in rel 13 D. Costanzo, Bergen D. Froidevaux
Release 13 forecast (AOD) D. Costanzo, Bergen Version _p2, drop collections Add cells in EM clusters Add Topo 420 Top: 60kB/event (Andrew) rescale by 70% T/P splitAdd hit on track, segments No planned change Total (forecast): 157kB/event Without truth D. Froidevaux
Release 13 forecast (ESD) D. Costanzo, Bergen Version _p2 Top: 60kB/event (Andrew) rescale by 70% T/P splitAdd PrepRawData No planned change Total (forecast): 1462kB/event No truth Uncertainty on InDet (wait for t/p code to be finished) D. Froidevaux
Release 13 top priorities • AOD: • Understand and monitor trigger size • Jet content (physics and technical ) • ESD: • PrepRawData and Tracks • Calorimeters are really at the limit • Uncertainty about size for 2*1033 luminosity • Time scale: • Validation after release 13.0.10. No big changes after May (??) D. Costanzo, Bergen D. Froidevaux
Evolution of analysis model A. Farbin, Bergen D. Froidevaux
A. Farbin, Bergen D. Froidevaux
Analysis with release 12: the 2-stage model Stage 1: • Read in AOD • Run selection and overlap removal algorithms (e.g. in EventView) • Choose unique (or more rarely multiple) view(s) of event • Define and write out DPD (e.g. HighpTView, TopView, SUSYView) Stage 1 is quite slow and needs to be run in batch mode in general (this stage will be faster in release 13 but not clear yet by how much) Stage 2: • Analyse DPD in Root/Athena • Produce physics plots • Do fits to backgrounds/signals, statistical and systematic studies, … Stage 2 can be fast and interactive: most users want to reach this stage as efficiently and quickly as possible D. Froidevaux
Analysis with release 13: a more complex 3-stage model Stage 0: • Read in AOD • Recalibrate certain objects (e.g. electrons and jets) • Rerun high-level reco algorithms (e.g. b-tag, refit e/m tracks, jets) • Possibly write out revised AOD collections Stage 0 should be configured and run with validated algorithms from reconstruction and therefore EDM and interfaces should be prepared with care. Book-keeping and versioning will be important. Stages 1/2: • Reselect physics objects of interest (e.g. loose e, tight m, medium t…) • Recompute ETmiss (ETmiss code thus appears as the only one tightly coupled to analysis) • Proceed with rest of stage 1 and with stage 2 as before There are indeed additional requirements (see next slides) D. Froidevaux
Analysis with release 13: a more complex 3-stage model • This 3-stage model requires more core software functionality to implement it in a robust and maintainable way: • Need to move away from cloning e.g. ConeTowerParticleJets with ConeTowerParticleJetsAOD when rerunning b-tagging algorithm • Need versioning of collections so that AOD analysis job configuration does not depend on whether one is reading in AOD before or after stage 0. This will most likely not appear anytime soon in release 13. • The above is related to the Unique ID per object discussed a year ago and not implemented yet. • Configurables for recalibration step will have to be intelligent and robust so that users do not go astray because they are not experts. • Tools from reco software should be reused (no hacks!). D. Froidevaux
Analysis with release 13: a more complex 3-stage model • Two concrete examples of stage 0 analysis: • Redefine electrons as 3x5 (rather than 3x7) sliding window objects and recalibrate using more complex scheme (L. Carminati) rather than default scheme: • Feasible with release 13 using standard tools and interfaces • Original electron was saved with link to default 3x7 SW cluster and to some number of cells (e.g. 7x7) • New electron will be saved with link to 3x5 SW cluster and to the same cells Cloning electrons with same definition and two different calibration schemes will allow fast comparison of these schemes at AOD level rather than multiplication of reco jobs run on Grid. Such tools will be very useful for development and understanding of high-level algorithms related to E/p studies, EM calo intercalibration using Z to ee decays, etc. D. Froidevaux
Analysis with release 13: a more complex 3-stage model • Two concrete examples of stage 0 analysis: • Refit electron or muon track based on RiOonTrack list saved to AOD (see talks by E. Moyse and A. Wildauer for more details) • Should be feasible with release 13 with standard tools (fitters, extrapolators) provided RiOonTracks are saved to AOD • Requires ESD/AOD merger of interfaces for TrackParticle (new TrackParticleBase) and for important tools such as TrackInCalo (done?) and Vector(TrackParameters) for representations beyond perigee • Need to sort out how to deal with different hypotheses used for track fits (electron can be fit with pion hypothesis, electron hypothesis, calo brem fit). Not really settled even in reco yet. Alignment has been suggested as a use-case for stage-0 in AOD. This cannot (in my opinion) be seriously envisaged and alignment should remain a dedicated reco task with its specific streams: • We cannot rerun alignment only on certain tracks and not on others and there is no way we will have space to save hits for all tracks • More importantly, any significant change in alignment constants requires a rerun of the pattern recognition. To do this with ATLAS (it was done with BaBar at some level) would require saving more than RiOonTrack. Most likely client would be b-tagging: much too complicated. D. Froidevaux
A. Farbin, Bergen D. Froidevaux
A. Farbin, Bergen D. Froidevaux
CAT Tier-3 and Analysis/Computing Models • The ATLAS analysis model is moving towards a three-stage approach • ATHENA-based AOD analysis for the more complex parts • Final analysis on Root trees (DPDs) produced from ATHENA • Input assumptions: which activities on the CAT Tier-3? • Bulk DPD production done “centrally” per physics analysis (using Grid) • 100% of disk for DPDs and ~ 80% of CPU for DPD analysis (will be less initially). • Remaining 20% (more initially for first bullet below!) of CPU devoted to: • analysis of special samples (AODs, ESDs, express stream, etc) • simulation (fast, full) beyond central production • toy MC production, event mixing and other sophisticated tools • Advantages and disadvantages of working on Tier-3 at CERN • Pro: direct access to ESDs and express stream • Pro: software releases available • Con: strong interference with collaboration needs in terms of access to resources: negotiation required (and already started to some extent) M. Elsing, A. Farbin, D.F. D. Froidevaux
CAT Tier-3 and Analysis/Computing Models M. Elsing, A. Farbin, D.F. • Input assumptions: how many users and analyses on CAT Tier-3? • Number of users defines CPU needs (especially interactive) and number of analysis streams defines disk needs (speed of access to data is also a critical parameter, see below). Assume 2 109 events collected per full year. • CAT team will be ~ 60 physicists (at 50% of their time on average) • Assume 4 broad physics topics as now with 3 real analysis activities each • Disk space requirements • For each of the 12 specific analysis topics, need ~ 4 DPD datasets (two versions, current and previous, and two sizes, 10 kB/event standard and 1 kB/event reduced). Assume each analysis accesses ~ 10% of total data • In addition, need wider more generic DPDs (à la HighpTView) for selection of background control samples and other analysis certification work: this might correspond to 50% of total AOD datasets with 20 kB/event. This latter estimate might easily go up to 100% and 30 kB/event. • Total disk need is therefore 3 * 2 109 * (24 * 0.1 * 11 kB + 0.5 * 20 kB) or ~ 240 TB per year including contingency factor 3.Currently a disk server is 8 TB, can run at 200 MB/s and costs 20 kCHF, so recurrent cost is 0.6 MCHF per year. D. Froidevaux
CAT Tier-3 and Analysis/Computing Models M. Elsing, A. Farbin, D.F. • CPU requirements (interactive and batch). CPU means core processor. • Typical CPU today can read 10 MB/s when I/O limited (not yet achieved for ATLAS DPDs). Assume 1 MB/s for complex analysis job running on DPDs. • Need 20 CPUs per disk servers, i.e. 600 CPUs total (1 CPU ~ 1 SI2K) • Also need about 30 to 50 interactive CPUs • With such a farm, what would be time needed for a user to: • go through 20% of generic DPDs (30 kB/event) at 1 MB/s: < 1 day for 200 CPUs • go through 100% of specific DPDs (10 kB/event) at 1 MB/s: < 1 night for 50 CPUs • go through 100% of short DPDs (1 kB/event) at 10 MB/s: ~ 30’ for 10 CPUs • Total recurrent CPU needs are therefore 650 * 2 kCHF ~ 1.3 MCHF • Next steps • Collect feedback from CAT users (our initial assumptions should be quite conservative so that any financial negotiations in the future are well understood by us in terms of limitations to our analysis capabilities!) • Discuss with ATLAS computing management and ATLAS management • Discuss with CMS before going to PH/IT managements around end June D. Froidevaux