Luminosity, detector status and trigger - conditions database and meta-data issues

Luminosity, detector status and trigger - conditions database and meta-data issues • How we might apply the conditions DB to some of the requirements in: • Luminosity task force report • Run structure report • Meta-data task force report (draft) • Data preparation/data quality discussions • This talk: • Reminder of conditions DB concepts relevant here • Proposal for storage of luminosity, status and trigger information in CondDB • Relation to TAG database, data flow through system • Other meta-data related comments • For more in-depth discussion, see document attached to agenda page Richard Hawkings ATLAS luminosity TF workshop, 3/11/06 Richard Hawkings

Conditions DB - basic concepts • COOL-based database architecture - data with an interval of validity (IOV) Application Indexed COOL • COOL IOV (63 bit) can be interpreted as: • Absolute timestamp (e.g. for DCS) • Run/event number (e.g. for calibration) • Run/LB number (possible to implement) • COOL payload defined per ‘folder’ • Ttuple of simple types  1 DB table row • Can also be a reference to external data • Use channels (int, soon string) for multiple instances of data in 1 folder • COOL tags allow multiple data versions • COOL folders organised in a hierarchy • Athena interfaces, replication, … C++, python APIs, specific data model SQL-like C++ API Relational Abstraction Layer (CORAL) Oracle DB MySQL DB SQLite File Frontier web Online, Tier 0/1 Small DB replicas File-based subsets http-based Proxy/cache Richard Hawkings

Storage of luminosity block information in COOL • Luminosity block information from the online system • Start/end event number and timestamps per LB, {livetimes, prescales}/trigger chain • How might this look in COOL - an example structure (RE=run/event) /TDAQ/LUMI/LBRUN - LB indexed by run/event /TDAQ/LUMI/LBTIME - LB indexed by timestamp /TDAQ/LUMI/LBLB - LB information (start/stop event, time span) indexed by RLB /TDAQ/LUMI/TRIGGERCHAIN - trigger chain info identified by channel, indexed by RLB /TDAQ/LUMI/ESTIMATES - luminosity estimates versioned and indexed by RLB Richard Hawkings

Storage of detector status information in COOL • Detector status from DCS - many channels, many folders; to be merged: • Merge process combines folders, channels, derives set of IOVs for summary.. • Involves ‘ANDing’ status over all channels, splitting/merging IOVs - > tool ? • Similar activity for data indexed by run-event … have to correlate this somehow • Final summary derived first as function of run-event (combining all information): • Then map status changes to luminosity block boundaries (using LB tables): • Status in an LB is defined as the status of the ‘worst’ event in the LB /GLOBAL/STATUS/TISUMM - summary info (one channel per detector/physics), indexed by timestamp /GLOBAL/STATUS/RESUMM - summary info (one channel per detector/physics), indexed by run/evt /GLOBAL/STATUS/LBSUMM - summary info (one channel per detector/physics), indexed by RLB Richard Hawkings

Storage of trigger information in COOL • Source for trigger setup information is the trigger configuration database • Complex relational database - complete trigger configuration accessed by key • Store trigger configuration used for each run • LVL1 prescales may change per LB - stored in /TDAQ/LUMI/TRIGGERCHAIN • In principle this is enough, but hard to access trigger config DB ‘everywhere’ • Copy basic information needed for analysis/selection to condDB: ‘configured triggers’ • Other information needed offline: efficiencies • Filled in offline, probably valid for IOVs much longer than a run: /TDAQ/TRIGGER/CONFIG - efficiency info (one channel per chain) - trigger configuration (Run/event key, really spanning complete runs) /TDAQ/TRIGGEREFI - efficiency info (one channel per chain), indexed by run (/event) /TDAQ/TRIGGEREFI - efficiency info (one channel per chain, versioned), indexed by run (/event) Richard Hawkings

Relations to the TAG database • TAG database contains event level ‘summary’ quantities • For quickly evaluating selections, producing event collections (lists) for detailed analysis of subsample of AOD, ESD, etc… • Need luminosity block and detector status information to make useful queries ‘Give me list of events with 2 electrons, 3 jets, from lumiblocks with good calo and tracking status and where the e25i and 2e15i triggers were active’ • Various ways to make this information available in TAGs • Put all LB, status and trigger information in every event: make it a TAG attribute • Wasteful of space, makes it difficult to update e.g. status information afterwards • Hard to answer non-event-oriented questions (‘give me list of LBs satisfying condition’) • Store just the (run,LB) number of each event in TAGs, have auxiliary tables(s) containing LB and run-level information • Tag database does internal joins to answer a query • Need to regularly ‘publish’ new (versioned) status information from COOL to TAGs • Have TAG queries get LB/status/trigger info from COOL on each query • Technically tricky, would have to go ‘underneath’ COOL API (or don’t use COOL at all) • Solution 2 seems to be the best … try it ? Richard Hawkings

Data flow aspects • Walk through the information flow from online to analysis • Online data-taking: Luminosity, trigger, and ‘primary’ data quality written in COOL • Calibration processing: Detector status information is processed to produce first summary status information • Put this in COOL summary folders (tagged ‘pass1’); map to LB boundaries • Bulk reconstruction: Process data, produce tags • Detector quality information (‘pass1’) could be written to AODs and TAGs (per event) • Upload LB/run level information from COOL to TAG DB at same time as TAG event data upload … users can now make ‘quality/LB aware’ queries on TAGs • Refining data-quality: Subdetector experts look at pass1 reconstructed data, reflect, refine data quality information, enter it into COOL (‘pass1a’ tag) • At some point, intermediate quality information can be ‘published’ to TAG DB • Users can do new ‘pass1a’ TAG queries (LBs/events may come or go from selection) • This can be done before a new processing of the ESD or AOD is done • Estimating luminosity: Lumi experts estimate luminosities, fill in COOL • Export this info to TAGs, allow luminosity calculations directly from TAG queries? • Re-reconstruction: New data quality info ‘pass2’ in COOL, new AOD, new TAGs Richard Hawkings

A few comments • Not all analyses will start from TAG DB and resulting event collection • Maybe just a list of files/datasets - need access to status/LB/trigger chain information in Athena • Make Athena IOVSvc match conditions info on RLB as well as run/event & timestamp • AOD (and even TAG) can have detector status stored event-by-event • Allows vetoing of bad-quality/bad-lumi block events even without Cond DB access • With Cond DB access, can make use of updated (e.g. pass1a) status • Overriding detector status stored in AOD files • But Cond DB access may be slow for sparse events - no caching (need to test) • Hybrid data selection scheme could also be supported: • Use TAG database to make a ‘data qualiy/trigger chain selection’ and output a list of good luminosity blocks • Feed this into Athena jobs running a list of files - veto any event from a LB not in list • Maintaining ability to do detector quality selection without LBs implies: • Correlation of event numbers with timestamps for each event (event index files?) • Storing detector status info per event in TAG DB (difficult to do ‘pass1a’ update) Richard Hawkings

Comments on other meta-data issues • Luminosity TF requires ability to know which LBs are in a file, without the file • In case we lose / are unable to access file in our analysis • Implies need for file level metadata - on a scale of millions of files… • Who does this - DDM? AMI? New database? Should not be conditions DB? • Definition of datasets • The process by which files make it from online at SFOs to offline in catalogued datasets needs more definition • What datasets are made for the RAW data? • By run, by stream, by SFO? What metadata will be stored? • Datasets defined in AMI and DDM? Files catalogued in DDM? • What role would AMI play in selection of real data for an analysis? C.f. TAG DB ? • What about ESD and AOD datasets - per run? per stream? • What about datasets defined for the RAW/ESD sent to each Tier-1? • The RAW/ESD dataset for each run will never exist on a single site? Richard Hawkings

Possible next steps • If this looks like a good direction to go in … some possible steps • Set up the suggested structures in COOL • Look at filling them, e.g. with data from the streaming tests • Explore size and scalability issues • In Athena … • Set up some access service and data structures to use the data • E.g. for status information, stored In condDB and/or AOD, accessible from either with the same interface • Make Athena IOVSvc ‘LB aware’ • Look at speed issues - e.g. penalties for accessing status information from CondDB in every event in sparse data • Work closely with efforts on luminosity / detector status in tag database • First discussions on that (in context of streaming tests) have taken place this week Richard Hawkings

Luminosity, detector status and trigger - conditions database and meta-data issues