290 likes | 391 Views
Introduction to PAT. PAT Tutorial CERN December 2012 lecturer: Andreas Hinzmann (CERN) tutors: Guillermo Breto (UCD ), Sudhir Malik ( Nebraska) and the PAT tutorial team. Content. Part I : Brief review of CMSSW Framework Essentials Event Data Model (EDM).
E N D
Introduction to PAT PAT Tutorial CERN December 2012 lecturer: Andreas Hinzmann (CERN)tutors: Guillermo Breto (UCD), Sudhir Malik (Nebraska) and the PAT tutorial team
Content Part I: Brief review of CMSSW • Framework Essentials • Event Data Model (EDM) • Part II: Introduction to PAT • The PAT Dataformat • The PAT Workflow
Basic Concept • The software of CMS is CMSSW: • Flexible structure based on the Event (Event Data Model, EDM): • - Single data container in memory: edm::Event • - Data are uniquely identified within an Event • - Modular Event Content • Modular framework • Modules are configurable and communicate with the Event • Data processing steered via Python job configurations
Event Data Model (EDM) • The EDM iscenteredaroundtheconcept of an Event • An edm::Eventis a C++ Container for all data of a particularcollision: RAW, reconstructedoryourdata • Data are passed from one module to the next via the Event, and are accessed only through the Event. • All objects in the Event may be individually or collectively stored in ROOT files directly browsable in ROOT.
DataFlow in the Framework • One executable cmsRunconfiguredwith python files • Those files contain configurations and parameters for modules written in C++ • You can compose your analysis with thesemodules: • Each modulecanbeusedmorethanoncewith different parametersettings • The python config file defines: • Which data is used • Which modules areexecuted, their parameters and execution order (path) • How these paths areconnected to output files Sequence
Scheduled/Unscheduledmode • Will be a used in futureversionsof PAT • An alternative todefiningthefullscheduleofmodules in a pythonconfigurationisthe so calledunscheduledmode • In unscheduledmode • Onlythemodulesforthe final objectsaredefined in thepythonconfiguration • Necessaryinputissearchedforautomaticallyandthecorrespondingmodulesarescheduledautomatically • Example: Muonselectionmodule in configuration -> Muonbuildingmodulescheduledautomatically
Framework modules • Source • Reads the event in a root file • OutputModule • Writes events to a file. Can use filter decisions • EDAnalyzer (read) • Reads a collection • Creates histograms • EDProducer(read/write) • Reads a collection • Creates new products and write a new collection in the Event • EDFilter(read/write) • Can be used to decide if to keep or not an object could • Example of usage: to control the analysis flow or make skimming
Storing and managing Products in the Event The Products can be read and analyzed by an EDAnalyzer (read only), stored in the Event using an EDProducer (read/write), you can also make a filtering using an EDFilter module (read/write). These are the basic steps of data processing
Framework modules • There are routines to create skeletons for these modules: • mkedanlzr, mkedfltr, mkedprod • These create the necessary substructure for the modules: • BuildFile.xml: Needed for compilation • Myanalyzer_cfi.py: Demo python configuration • Myanalyzer.h: Demo header file • Myanalyzer.cc: Demo definition file • Compilation with: $ scram b also bin/ scripts/ plugins/ possible
Framework Tools • The Framework provides different tools to inspectthe Event Content and configurationfiles: • ROOT TBrowser • edmDumpEventContent • pythoninteractive • edmConfigEditor
What are the stored products? root -l [] TFilef(“AOD.root”) [] new TBrowser() Data inside the event are called “Product” type : moduleLabel: productInstanceLabel : processName Example: recoTracks_generalTracks__RECO
What are the stored products? edmDumpEventContent <filename> C++ class type moduleLabel productInstanceLabelprocessName vector<reco::MET> "tcMet" "" "RECO." vector<reco::Muon> "muons" "" "RECO." vector<reco::Muon> "muonsFromCosmics" "" "RECO." vector<reco::Muon> "muonsFromCosmics1Leg" "" "RECO." vector<reco::PFCandidate> "particleFlow" "" "RECO." vector<reco::PFCandidate> "particleFlow" "electrons” "RECO." vector<reco::PFJet> "ak5PFJets" "" "RECO." Handle<reco::MuonCollection> muons; Event.getByLabel(”muons”,muons ); Access the single Product in the framework module reco::MuonCollection is a typedef for vector<reco::Muon>
How to browse your cfg file If you are Python addicted...and if not yet maybe you will be soon ;-) ... you may want to inspect your config file in python interactive mode: $ python –iconfig_file_cfg.py # to inspect the process path called “path” >>>process.path >>>cms.Path(electronMatch+patElectrons+muonMatch+patMuons+pfPileUpIso+pfNoPileUpIso+pfAllNeutralHadrons+pfAllChargedHadrons+pfAllPhotons + …) Ctrl + D to exit
How to browse your cfg. You can inspect your config file using a graphical tool: the edmConfigEditor Graphical representation TreeView PropertyView Box Tool-Tip: Double-click on modules to navigate trough the configuration chain
FWLite: A light Version of EDM This is ROOT with known data formats • PAT isfully compatiblewith (and even especially supports) FWLite. • No writingto the event content! • Full framework ↔ FWLite: This isnotan exclusive or! • Python configuration, edm::Handle, TFileService, data access equivalent to EDM • Very useful for plotting and interactive analysis • Have a look at: WorkBookFWLite
FWLite gives access to classes Automatic library loading [] gSystem->Load("libFWCoreFWLite”) []AutoLibraryLoader::enable() [] new TBrowser()
Part II Part I: Brief review of CMSSW • Framework Essentials • Event Data Model (EDM) • Part II: Introduction to PAT • The PAT Data Format • The PAT Workflow
Whatis PAT? PAT (Physics Analysis Toolkit) is a toolkit as part of the CMSSW framework aimed at performing analysis. It provides: Interface: • Between RECO expertise and analysis contacts • Simplifies access via dataformats • Canalizes expertise of POG and PAG Common Tool: • approved algorithms & sensible defaults • synergy (everybody can profit from recent developments) • quick start into analysis for the beginners Common Format: • facilitates transfer & comparisons • PAG common configurations • sustained provenance
Top 5 Analyst‘s Problems PAT canhelpyouwiththeseproblems
PAT - DATA Formats • Representationofreconstructedphysicsparticles • pat::Candidate (pat::Jet, pat::Photon, pat::Muon, etc..) • Thereis a base class common toallkindof “Particles”: the reco::Candidate • Itprovidesaccess: • kinematics(pt, mass, eta, phi, etc. ) • underlyingcomponenents(link totrack, superclaster, etc.) • navigationamong the daughters(toaccess the daughterparticles and theirattributes ) • Thepat::Objectinheritsfrom the reco:Candidate • you can add extra informationstopatCandidateswrt reco Candidatessuchas: • Isolation • MC matching • Trigger matching pat::Candidate = reco:Candidate + more
Facilitated Access to Event Information PAT objects summarize this information which is distributed over different collections When you are using PAT it is just calling a member function to get this information! • PAT summarizes information for you: • The reco::Candidate is a base class common to all kind of “particles” • It has a lot of information from different subdetectors and reconstruction algorithms
PAT - DATA Formats 2 This is the hierarchy of pat::Candidates
PAT - DATA Formats 3 https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookPATDataFormats Have a look in the online documentation:
The PAT Workflow • Steps of the PAT Workflow: • Candidate Creation:aodReco • collecting of information which is not in AOD/RECO, • e.g. isolation variables, overlaps, … • Candidate Production: patCandidates • translation of the collected information into pat::Object e.g. pat::Muon, pat::Electron, pat::Jet • Candidate Selection: selectedPatCandidates • selection of interesting Objects with specific • properties e.g. pT > 30 GeV • Candidate Disambiguation : cleanPatCandidates • Due to the way objects are reconstructed in CMS there are ambiguities: e.g. two objects sharing an energy deposit or track
The Code Location • DataFormats/PatCandidates • Definition of all PAT Candidates. • pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, … • PhysicsTools/PatAlgos • Implementation and filling of all data formats. • Definition of common workflow and PAT tools • PhysicsTools/PatUtils • Definition of common tools and helper functions used in • PatAlgos • PhysicsTools/PatExamples • Location of many examples e.g. all non-trivial examples used during this Tutorial
Event content edmEventSize –v file.root
Documentation • SWGuidePATandWorkBookPATmain documentation pages • WorkBookPATDataFormatsdescription of all PAT Candidate • WorkBookPATWorkflowdescription of the PAT workflow • WorkBookPATConfigurationdescription of the configuration of PAT • SWGuidePATToolsdescription of all PAT tools • WorkBookPATTutorialtutorials and examples to get started • SWGuidePATRecipesinstallation recipes • SWGuidePATEventSizetools for event size estimate • And last but not least: This Tutorial and/or former Tutorials...
Installation recipe • PAT is part of every CMSSW release: Take it from the release unless you face a problem only recently addressed in PAT development • Latest recommended PAT releases:https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuidePATRecipes#CMSSW_5_2_X_CMSSW_5_3_X_pro_2012 • Latest development PAT recipes:https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuidePATReleaseNotes52X • This tutorial: • cmsrelCMSSW_5_3_6 • cd CMSSW_5_3_6/src/cmsenvaddpkgDataFormats/PatCandidates V06-05-06-03addpkgPhysicsTools/PatAlgos V08-09-46addpkgFWCore/GuiBrowsers V00-00-70scram b -j 9
Exercises By now you should be prepared to do the following Exercises on WorkBookPATTutorial: Have Fun! Exercise 1: (WorkBookPATDocNavigationExercise) The PAT Documentation is one of the most looked after parts of the WorkBook. To know the documentation and how to use it can speed up your learning curve enormously. Learn more about the PAT Documentation and how to make effective use of it. Exercise 2: (WorkBookTupleCreationExercise) Learn how the default PAT tuple is produced Exercise 3: (SWGuidePATConfigExercise) Learn how to configure PAT and its tools.