180 likes | 192 Views
ATLAS Distributed Computing Tutorial. Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow. Tags. What are tags? Why have them? When are they produced? Where are they? How can they be used?. What are Event Tags?.
E N D
ATLAS Distributed Computing Tutorial Tags: What, Why, When, Where and How? Mike Kenyon University of Glasgow
Tags • What are tags? • Why have them? • When are they produced? • Where are they? • How can they be used?
What are Event Tags? • Event-level metadata: summary information about events, with a “pointer” to the corresponding AOD/ESD/RDO format • Useful for selecting events for physics analysis • Should be no bigger than 1KB per event (~ 1% AOD size)
Why have Event Tags? • To make Physicist’s life easier and analysis faster • Allows you to exclude uninteresting events from data sample used for analysis without searching through AOD/ESD files • Samples of specific interest to an analysis can be extracted into a smaller set of files for repeated running • Provides a global view of the data, useful for data mining • Not to do analysis on directly
Tag Use Cases • Some Physicist use cases: • Using official Tags with query in job options • Using local Tag “database” for preliminary analysis • Using global Tag database to look for events • Using global Tag database to build input list for Athena jobs
What do they look like? • The LCG POOL infrastructure is used to store Tags • Hence use of “collection” terminology • They exist in 2 forms: • ROOT files • Relational database (MySQL and Oracle) • Why keep 2 forms? • ROOT files useful for local work • DB useful for queries, global view of data • Tag content: collection information + event information
Tag Content • Collection Information • Collection ID, AOD/ESD/RDO references • Global Event Quantities • Event no., run no., no. of tracks, missing ET etc • Trigger Decisions • Electrons, Photons, Muons • Number, PT, h, f, etc • Jets, Taus • Number, PT, h, f, etc
When are Tags produced? • Written to ROOT files at Tier 0 during AOD production – “Explicit collections” • Data then imported into central relational database (Oracle at CERN) • Database replicated to Tier 1 and lower • Oracle where available; MySQL otherwise • Users can create their own tag files
Sample Queries • General Collection Information • How many events in collection A? • What are the names and types of Tag attributes? • What production task(s) produced these Tags? • Content Queries • Give me all events with at least 2 electrons and missing ET > 10 GeV which are ‘good for physics’ • Summary Queries • Give me the number of events for some content query • Give me sum of the luminosity for some content query
How can Tags be used? • Collection tools • Athena • Tag Navigator Tool (TNT)
Collection Tools • To use Tags in Athena, you need to know what the attributes are • POOL Collection tools can be used for this • Can copy collections, append collections, print list of files used, etc • Allows queries on the input collections • See Tutorial Exercises, part 1
Tags in Athena • Both ROOT and Relational Tags can be read directly from Athena • Need file catalogue to find the AOD files, and Athena version which matches that used by the Tags • One can also produce private ROOT Tags from AOD • Focus here is on reading, rather than building, Tags
Local Tag Files with Athena • jobOptions for event selection look like:
Remote Tag Database with Athena • Not many Tags available in central database yet • This constrains the exercises somewhat, but we can at least illustrate the principles • jobOptions must include lines like: EventSelector.InputCollections = ['rome_4312_merge_H12_140_gamgam_AOD_tags’] EventSelector.Connection = 'oracle://atlas_tags/atlas_tags_rome’ EventSelector.CollectionType = 'ExplicitRAL'
Tag Navigator Tool (TNT) • A utility which aims to allow ATLAS physicists to use the Tag database for analysis • Runs a query on the database and outputs a local ROOT collection • Divides this into a number of sub-collections • Submits user jobs to LCG, one per sub-collection • Output files can be registered as new DQ2 dataset
What’s there now? • There is still a lot of work to be done to get an efficient Tag system running • Currently running performance / scalability tests on central database • Need Tags to be produced and loaded into database as a matter of course • Tag database from Rome workshop is still there, now awaiting Tags from Streaming Tests
And finally… • Tags will become ever more useful as real data appears • Infrastructure is still being developed • Wednesday’s exercises aimed at familiarisation with ideas and methods