220 likes | 489 Views
An Introduction to PAT (1.1 What is PAT and how to use it). Hamed Bakhshian PAT Tutorial 20.09.10. Outline. What is PAT ? What people did before PAT ? What PAT is … PAT data formats PAT workflow : All, selected and clean PAT Objects Event content. What is PAT.
E N D
An Introduction to PAT(1.1 What is PAT and how to use it) HamedBakhshian PAT Tutorial 20.09.10
Outline What is PAT ? • What people did before PAT ? • What PAT is … • PAT data formats • PAT workflow : All, selected and clean PAT Objects • Event content
What is PAT To understand what PAT is, let’s see What people was doing without PAT
What was people doing without PAT Let’s start with an example: A student wants to use electron in his studies. Specially he wants to focus on Electron isolation. He finds out quickly that the isolation information are already stored in the electron during reconstruction process. They are accessible via these two methods from reco::GsfElectron class : But !!! Just for cones with radii 0.3 and 0.4 What if he wants to study isolation in other cones?
What was people doing without PAT A primitive solution • Calculating Isolation is an standard task in CMS and its algorithms are implemented by Egamma Physics Object Group (POG). • It is a configurable module produces the isolation value with the desired cone size and associates it with the electron. • Then in an analyzer, one can read the new isolation values … Schematic view of CMSSW Event Data Model (EDM) Event content writes to EDProducer read by EDFilter read by IsolationResults … reco::Electron reco::Muon EDAnalyzer • In our example, EDProducer is the standard Egamma Isolation Producer
What is PAT Doing so for every object and every needed property needs a lot of time. and … PAT is a solution to make analysis easier
What is PAT • PAT does NOT re-invent the wheel • prevents re-inventions • helps standardize • spread finest knowledge in a collaboration of 3000 physicists • PAT is the Consequence & Completion of the Event Data Model PAT advantages from an analyzer point of view: • Using approved algorithms & sensible defaults • Profiting from recent developments • Simplifying access via DataFormats • Quick start into analysis for beginners And also some advantages for CMS group: • Crossing point between POGs & PAGs ('vertical integration') • Canalizes expertise (POG & PAG contacts)
Code location CMSSW DataFormat PatCandidate Common utilities, isolation, object disambiguation, ... PatAlgos • Classes for • pat::Candidate • creation • Algorithms, configs • , … PhysicsTools Example Analyzers built up for tutorials PatUtils PatExamples
PAT DataFormats • Access to event information within the EDM, like the first example, is not so easy : Correction Factors, Object resolutions JetFlavor Object Id, reco::Candidate Cluster shapes Generator Match, Trigger Match Isolation More, ... (different from defaults) Associated Tracks, BTag Algorithms, JetCharge • With PAT Candidates you get this just by calling member functions! TagInfos • Note: Each PAT Candidate IS a corresponding reco::RecoCandidate (and more)
PAT DataFormat • PAT extends reco::Candidates so that they can store more information. • The Code of these new data types can be found at DataFormats/PatCandidates • All pat::[Candidates]inherit from their corresponding reco::[Candidates] • A pat::[Candidates] is a reco::[Candidates] (plus more) • Here [Candidate] is a general word for any physics object like electron, muon, etc.
PAT DataFormatpat::Electron as an example • pat::Electron inherits all properties of reco::GsfElectron • It also inherits some properties from pat:PATObject: • Information about efficiency is accessible via efficiency()method • The best-matched MC electron, accessible via genParticle() method • Resolution on Energy, Pt and position accessible via resolE(), resolPt() and so on… • Information about trigger object matched to this electron, accessible via triggerObjectMatches() method. • It also inherints isolation attributes from pat::Lepton (e.g. userIsolation()&ecalIsoDeposit() methods) • Some important attributes for identification are implemented in pat::Electron itself and are accessible via electronID() method.
PAT DataFormatsembedding • By default, reco::[Candidates] keep only a reference to their main constituents. • For example reco::GsfElectron stores only a reference to its track and SuperCluster. • They are not independent from other parts of event • Their size are optimized Track collection Reco::Electron SuperCluster collection • In pat::Objects, Information can be made persistent (embedded) or kept as reference. • Being independent from other parts of event • makes EventContentsmore flexible. • Working with FWLite easier. pat::Electron Tracks SuperCluster More details in tomorrow session.
PAT DataFormats • Find further documentation on WorkBookPATDataFormats
PAT[Candidates] • The first step is making pat::[Candidate] from each reco::[Candidate]. • In this step, PAT imports all of the standard modules by their default values from different POG’s, runs all of them and combines the associated results with each object to make pat::Objects. • Then puts all of the new objects in the event. • In this layer, MC matching is also done. To see what the configs look like, have a look at these directories : • MC Matching • PhysicsTools/PatAlgos/python/mcMatchLayer0 / • Sequences from POG’s • PhysicsTools/PatAlgos/python/recoLayer0/ • Final commands, to create PATCandidates • PhysicsTools/PatAlgos/python/producersLayer1/
New rule for isolation • All of the configured isolations are calculated and stored in the object but … • The value returned by standard isolation methods like : • trackIso() • ecalIso() • hcalIso() • Are the reco::Object isolation values which are calculated during reconstruction • All of the pat::Object isolations are accessible via userIsolation() method • Another nice feature in PAT: • IsoDeposits, which are Et or Pt of the calorimeter hits or tracks around a lepton, are always available in pat::objects to recalculate isolation. • More details in tomorrow session and exercises
Configurability • PAT::Objects are designed to be as less restricted as possible. • They can store unlimited number of isolations, efficiencies, identifications, resolutions and ….. • The producers should be very flexible and configurable to be able to use this elegant feature. • And they are ! • For example, you can add as many identification values as you want to electrons by configuring the PATElectronsmodule: • Default configuration contains all of the default identifications • More details in tomorrow session and exercises
selectedPat [Candidates] • Among all created pat::Objects, a simple and reasonable cut on Pt and Eta can decrease the number of created objects and save more space. • Results are stored in selectedPat[Candidates]. • Default cuts can be read in WorkBookPATConfiguration • These default cuts can be easily changed in the main config file. • All of the config files are in • PhysicsTools/PatAlgos/python/selectionLayer1 • Using PhysicsCutParser makes them very readable and easy to change : The documentation of PhysicsCutParser can be found at :SWGuidePhysicsCutParser
cleanPat [Candidates] • After selecting Candidates, each collection should be cleaned utilizing other collections. reco::SuperCluster electron Jet • For example, Jet collection should be cleaned from electrons… • A set of PATCandidateCleaners can help to resolve this double counting • The cross cleaning, is the last step of PATifying • Results are stored in cleanPat[Candidates] • The config files of cleaning is at : PhysicsTools/PatAlgos/python/cleaningLayer1 For more detail, attend the Wednesday session
EventContent • Which part of the produced data should be saved in the output file? • It is completely configurable, but there are some default ones: • By default, only cleanPat[Candidates] are saved. • Add EventContent to the output module:
Workflow • Have a look at: • WorkBookPATWorkflow Pre-production steps Basic collection Main Collection (w/o cleaning) Main Collection (with cleaning) By default just this collection is stored in the output file.
Maximal Configurability • Sustain flexibility and user friendliness by maximized configurability