310 likes | 420 Views
Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008 Turkish Atomic Energy Authority (TAEA), Ankara, Turkey. Data Discovery Tools, DQ2 Enduser Tools and Physics Analysis Tools. Outline. User’s work-flow for Data Analysis Data Discovery Tools
E N D
Nurcan Ozturk University of Texas at Arlington SCHOOL ON HEP@TR-GRID April 30 – May 2, 2008 Turkish Atomic Energy Authority (TAEA), Ankara, Turkey Data Discovery Tools, DQ2 Enduser Tools andPhysics Analysis Tools
Outline • User’s work-flow for Data Analysis • Data Discovery Tools • AMI - ATLAS Metadata Interface • TAG Browser - ELSSI • DQ2 Enduser Tools • ATLAS Analysis Model • Analysis Model Forum Recommendations • Derived Physics Data (DPD) • Analyzing the Data (inside or outside Athena) • AthenaRootAccess (ARA) • EventView Nurcan Ozturk
User’s Work-flow for Data Analysis Setup the analysis code Locate the data Setup the analysis job Submit to the Grid Retrieve the results Analyze the results Nurcan Ozturk
ATLAS Metadata Interface (AMI) http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/index.html • AMI is a bookkeeping project. • AMI is a generic cataloging system (a database application). The majority of datasets currently catalogued in AMI are Monte Carlo datasets. AMI reads information from the task request system, and correlates it with information read from the production database. • AMI contains the physics metadata for: • 2008 real data • 2008 FDR exercise • 2007 Cosmics runs (M5 data) • 2006/2007 service challenge datasets • StreamTest • Data Challenges DC1 and DC2 / Rome Production System • Combined Test Beam • AMI also powers the TagCollector release management tool. Nurcan Ozturk
AMI Tutorial http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/Tutorial/ Or http://ami3.in2p3.fr:8080/opencms/opencms/AMI/www/Tutorial/FastTrackTutorial.html • What is AMI? • Where does AMI get its Information? • How do I search for a dataset? • Which information can I get from the result of an AMI dataset search? • What is the schema of the AMI dataset catalogue? • Why can I sometimes not find a dataset when I can see its existence in other catalogues? • Can I refine the search? • Can I simply browse all of the information in AMI? • Can I bookmark an AMI page? • Why doesn't the back button of my browser work? • Can I use AMI without going through the web interface? • How can I extract information from AMI? • How to I write to AMI? Nurcan Ozturk
How Do I Search For A Dataset? – Simple Search Follow the link to the “simple search interface” from the tutorial page: type here Nurcan Ozturk
Results From Simple Search (1) pull down menu link link links Nurcan Ozturk
Results From Simple Search (2) When you click on Provenance link it shows: what version of Athena software used in making evgen/digit/reco Nurcan Ozturk
Results From Simple Search (3) When you click on DQ2 link it shows: DQ2 Dataset Metadata, existing replicas of the dataset, a link to PanDA monitor Nurcan Ozturk
Results From Simple Search (4) When you click on PANDA link: It gets you to the dataset browser Nurcan Ozturk
How Do I Search For A Dataset? – Advanced Search Follow the link to the “Advanced search interface” from the tutorial page: Nurcan Ozturk
Results From Advanced Search Nurcan Ozturk
TAG • ATLAS will produce petabytes of data, a system of event-level metadata is needed to quickly identify and select events that are interested for a given analysis. This is provided by TAG files, and the TAG database. • TAG files are built from AOD according to offline analysis-style code. TAG files are then loaded into TAG database. • TAG files store information about the status of each sub-detector, trigger and physics object ID. • For instance for FDR-1 data TAGs contain: • Event information: • Run number, event number, luminosity block, number of vertices and tracks, primary vertex position. (Luminosity has an entry but not filled) • Variables such as the summed cell Et, missing Et magnitude, and phi • Trigger information: BitMasks encode pass, pass after prescale for each trigger item/chain • Physics objects: • multiplicity of physics objects and the Pt, eta, phi for the highest Pt objects • A tightness criterion for e/mu/gamma is included as is b-tag likelihoods and tau candidate likelihood. • PhysWords: 32-bit TAG Word. For b-physics for instance: • Bit 0: HighPtMuonPair, Bit 1: J/Psi candidate, Bit 2: Upsilon candidate. • See more details for FDR & TAGs from a talk by James Frost, April Exotics Working Group meeting Nurcan Ozturk
How Does TAG Selection Work? • Use the TAG file as an input to EventSelector or PoolTAGInput. • Make sure the matching Pool file (eg. AOD) is in the PoolFileCatalog. • Define you query of the TAG content. • Run the job. • Very flexible: • Can use the TAG to preselect the events from an AOD in which you are interested, passing only those to an analysis algorithm. • Can use the ATG to write out an AOD (or ESD, RDO) of only the selected events. • How to learn more? Good tutorials are available already: • https://twiki.cern.ch/twiki/bin/view/Atlas/FeedBackForTags • https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection • https://twiki.cern.ch/twiki/bin/view/Atlas/TagForEventSelection#Building_Tags_Under_12_0_31 (create tag files) • https://twiki.cern.ch/twiki/bin/view/Atlas/PhysicsAnalysisWorkBookTAG • https://twiki.cern.ch/twiki/bin/view/Atlas/PhysicsAnalysisWorkBookTAGAnalysis • https://twiki.cern.ch/twiki/bin/view/Atlas/TopFdrTag • http://twiki.mwt2.org/bin/view/Main/TutorialTag080318 (All the above links are available from this one.) Nurcan Ozturk
TAG Browser – ELSSI (1) • TAGs are accessed by users via a web interface called ELSSI, the ATLAS Event Level Selection Service Interface. • For FDR-1 data (tutorial) https://atldbdev01.cern.ch/tagservices/tutorial/index.htm • For FDR-1 data: https://atldbdev01.cern.ch/tagservices/fdr/index.htm You need Firefox to see this page As Jack Cranshaw informed me. Nurcan Ozturk
TAG Browser – ELSSI (2) How to use ELSSI: • Define a query to select runs, streams, data quality, trigger chains,… • Review the query • Execute the query and retrieve the TAG file (a root file) Nurcan Ozturk
The Client Tools to Retrieve Data • DQ2 enduser tools • Includes dq2_xxx (dq2_ls, dq2_get, etc) commands • Available to download from: https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#Download • The setup files are edited to accommodate local needs (dq2.sh, setup.sh) • Available on AFS at CERN: source /afs/cern.ch/project/gd/LCG-share/current/etc/profile.d/grid_env.sh source /afs/cern.ch/atlas/offline/external/GRID/ddm/endusers/setup.sh.CERN • gLite UI (User Interface) • Includes lcg-cp, egee-gridftp-xxx • Available on AFS at CERN: source /afs/usatlas.bnl.gov/lcg/current/etc/profile.d/grid_env.sh source /afs/cern.ch/project/gd/LCG-share/current/external/etc/profile.d/grid-env.sh • Why glite UI may be needed in OSG: dq2_put/get may use some gLite commands depending on the site they interact with (TiersOfATLASCache.py description): lcg-lg, lcg-rf, glite-gridftp-ls, lcg-gt • More Info: https://twiki.cern.ch/twiki/bin/view/Atlas/DDMEndUserTutorial Nurcan Ozturk
DQ2 Enduser Tools • dq2_ls: returns a list of datasets matching a given pattern • dq2_ls fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1 • dq2_get: copies the files from DQ2 to a local area • dq2_get –rv fdr08_run1.0003051.StreamEgamma.merge.AOD.o1_r6_t1 • dq2_put: registers datasets to DQ2 • dq2_poolFCjobO: creates PoolFileCatalog and Athena job-option for DQ2 datasets • dq2_register: uploads and registers external generator input files to DQ2 • dq2_cleanup: deletes a dataset from a site's catalog and storage. • dq2_sample: copies a portion of an existing dataset and registers it to DQ2 More info: https://twiki.cern.ch/twiki/bin/view/Atlas/UsingDQ2#DQ2_end_user_tools Nurcan Ozturk
Analysis Model Forum Recommendations on the Analysis Model includes metadata + simple UserData Nurcan Ozturk
Derived Physics Data - DPnD • Primary DP1D: POOL-based DPD produced by the GRID production system. There are expected to be O(10) primary DPDs, so the contents will not be very specific to an analysis. It is expected to be skimmed (keeping only interesting events), slimmed (keeping only interesting objects, for example electrons and muons), and thinned (keeping only the subset of information inside objects that is relevant in future steps) compared to the AOD. • An Example Job Options file AODtoDPD.py (see CVS) • Packages In CVS: TopDPDMaker, TauDPDMaker, BPhysicsDPDMaker, SUSYDPDMaker • Secondary DP2D: POOL-based DPD with more analysis-specific information. Typically, this is produced from Primary DPD and may be created using an Athena tool like EventView. • SimpleThinningExample • HighPtViewDPDThinningTutorial • Tertiary DP3D: Does not need to be POOL-based, it includes flat ntuples. Nurcan Ozturk
Analyzing the Data • Inside Athena • Interactive or batch using C++, python code. • Needs a part from Athena (depends on user needs). • Provides full access to all tools and services. • Outside Athena – AthenaRootAccess (ARA) • CINT, or using python, or compiled C++ code. • Does not need full Athena installation (expected 1GB) • Not all classes are available (example, calo-Cells) • Important: both methods use the same files as input. Nurcan Ozturk
ARA - AthenaRootAccess • Allows to read an AOD in ROOT like you would read a normal ntuple (without using Athena). • The goal is to seamlessly use Athena tools. • One can use identical code/tools to run on ESDs, AODs, DPDs. • The names of the variables in the AOD ROOT tree are the same as in the AOD. • Limitations: • However it uses the transient classes and converters of the ATLAS software so a portion of the offline is needed. A ~1GB distribution including Athena libraries. • Tools and data that need detector description, conditions, B-field etc, cannot be called in ARA. However this type of info can be put in UserData in DPD. • Gaudi based classes (like AlgTools, Services) don’t work in ARA. Wrapping machinery is needed to reuse the code in Athena/ARA. Nurcan Ozturk
ARA Examples (1) • CINT macros • Easy development (change code and run), • Run time is slow ~x10 C++ compiled code • C++ compiled code • Slower development (change code, recompile, cannot reload libs) • Fastest runtime • Integrates easily back into Athena • Python scripts • Easy development (change code, reload and run) • Simple example shows runtime ~x3 C++ compiled code • May be able to compile Python • Integration of developed code into Athena? • Examples on Twiki and in Release: • https://twiki.cern.ch/twiki/bin/view/Atlas/AthenaROOTAccess • PhysicsAnalysis/AthenaROOTAccessExamples Nurcan Ozturk
ARA Examples (2) • Available in CVS under PhysicsAnalysis/AthenaROOTAccessExamples • Need python script to open file and setup transient tree: lxplus:~> get_files AthenaROOTAccess/test.py • Compiled C++ Example: lxplus:~> root root [0] TPython::Exec("execfile('test.py')"); root [1] CollectionTree_trans = (TTree *)gROOT>Get("CollectionTree_trans"); root [2] ClusterExample ce; // Example class in AthenaROOTAccessExamples root [3] ce.plot(CollectionTree_trans); root [4] TruthInfo ti; root [5] ti.truth_info(CollectionTree_trans); • test.py takes about ~20 secs to load necessary dictionaries • One can recompile and then restart from the beginning Nurcan Ozturk
ARA Examples (3) • CINT Example: lxplus:~> root root [0] TPython::Exec("execfile('test.py')"); root [1] CollectionTree_trans = (TTree *)gROOT->Get("CollectionTree_trans"); root [2] gROOT->LoadMacro("AthenaROOTAccessExamples/macros/cluster_example.C"); root [3] plot(CollectionTree_trans); • One can now edit cluster_example.C and re-run LoadMacro • Python Example: lxplus:~> python -i test.py >>> import AthenaROOTAccessExamples.cluster_example >>> AthenaROOTAccessExamples.cluster_example.plot(tt) • One can now edit cluster_example.py and re-run: >>> reload(AthenaROOTAccessExamples.cluster_example) >>> AthenaROOTAccessExamples.cluster_example.plot(tt) Nurcan Ozturk
Analysis Frameworks: EventView (1) • This framework provides general tools for common analysis tasks like • particle selection • overlap removal • observable calculation • combinatorics • Recalibration • systematics evaluation • generating ntuples • Users can perform a great deal of their analyses in Athena by chaining and configuring a set of these tools and producing an ntuple for further analysis in ROOT. • Twiki page: https://twiki.cern.ch/twiki/bin/view/Atlas/EventView Nurcan Ozturk
Analysis Frameworks: EventView (2) • Though this style of "modular" analysis usually does not require writing C++, the EventView framework is completely extensible, so if necessary users can easily develop and mix their own C++ tools with the common EventView tools and share their configurations and tools with other collaborators. • Most users are introduced to EventView through one of the "View" packages (eg TopView, SusyView, HighPtView) which for the most part collect configurations of EventView tools for a specific set of analyses and produce a standard ntuple output. • These users typically start by analyzing the View ntuples produced by the various physics working groups, and then continue to re-configuring and re-running the respective View package if they require additional tuning for their specific analyses. • There also efforts to evolve (the persistent piece of) EventView in the context of AthenaROOTAccess. Nurcan Ozturk