160 likes | 771 Views
CMS RooStats Higgs Combination Package. Giovanni Petrucciani (UCSD). the combination package. A package that builds an executable that can be run to compute limits/signif. combine input –M method [ options ]
E N D
CMS RooStatsHiggs Combination Package Giovanni Petrucciani (UCSD)
the combination package • A package that builds an executable that can be run to compute limits/signif. combine input –M method [options] • Input can be a text “datacard” or an arbitrary RooStats Workspace saved in a ROOT file. • Internally, it has two main components • text2workspace: A python program that reads datacards and creates a RooStats model • The part that deals with the statistical methods
Datacards • Started from a simple format for counting experiments: a text table with observed events and expected yields in each channel. • Progressively enhanced it: • Support for more pdfs for systematic uncertainties (e.g. Gammas, asymm errors, ...) • Use of shapes instead of just counting: • Binned shapes: plain ROOT histograms • Arbitrary shapes (RooAbsPdf) • RooDataHist and RooDataSet (or just TTrees)
Simple datacards example # one channel, we observe 0 events bin 1 observation 0 ------------ # expected events for signal and backgrounds bin 1 1 process ggh4G Bckg process 0 1 rate 4.76 1.47 ------------ deltaSlnN1.20 - 20% unc. on signal deltaBlnN - 1.5050% unc. on bkg.
Complex counting experiment Multiple channels bin e_tau mu_tau e_mu observation 517 540 101 ------------------------------------------------------ bin e_tau e_tau e_tau mu_tau mu_tau mu_tau e_mu e_mu e_mu process higgs ZTT QCD higgs ZTT QCD higgs ZTT other process 0 1 2 0 1 2 0 1 2 rate 0.34 190 327 0.57 329 259 0.15 88 14 ------------------------------------------------------ lumi lnN 1.11 - - 1.11 - - 1.11 - 1.11 tauid lnN 1.23 1.23 - 1.23 1.23 - - - - ZtoLL lnN - 1.04 - - 1.04 - - 1.04 - effic lnN 1.04 1.04 - 1.04 1.04 - 1.04 1.04 1.04 QCDel lnN - - 1.20 - - - - - - QCDmu lnN - - - - - 1.10 - - - other lnN - - - - - - - - 1.1 Treating of individ.backgrounds Correlated effects of systematics Systematics with names, to allow combination of datacards
From Counting to Shapes • For number countings, the Likelihood in each channel is constructed as sum over all the contributing physics processes N(exp) = sum( N(exp, proc), … ) pdf = Poisson(N(obs) | N(exp)) full pdf = PROD(pdf(channel1), pdf(channel2), …) • The extension to shapes is trivial pdf = RooAddPdf( N(exp, proc) * shape(proc), … ) full pdf = RooSimultaneous( pdf1, pdf2, … )
Shape Uncertainties (1) Vertical Morphing: • Using linear or quadratic interpolation between 3 templates. • Gaussian contraint on interpolation parameter. • Works on any pdf: histo, keys, parametric • Gives wrong results if the deformation is large (e.g. for a gaussian if the uncertainty on the mean is comparable with the peak width)
Shape Uncertainties (2) Simple parametric uncertainties: • Shape parameter used as nuisance, with a constraint (Gaussian or Bifurcated Gaussian • Relies on the user to select a physics-oriented parametrization in which nuisances are uncorrelated.
Shape uncertainties (3) Parametric interpolation: • Templates are created, e.g. from different MCs • Same parametrization is used to fit all templates, and save two additional RooArgSets of parameters for each systematic • Shapes are obtained interpolating linearly the parameters between the RooArgSets (still under developed, not yet there)
Combined Datasets • If all channels are counting experiments, make a RooDataSet with one column per channel • If all channels have all inputs as TH1s (data and templates), convert into RooDataHist all on the same dummy variable. Histos for channels that have less bins get padded with empty bins. • If at least one channel has a RooDataSet dataset, try to make a combined RooDataSet • Otherwise try to make a combined RooDataHist
Statistical tools proper What we do on top of RooStats: • Configure RooStats components through command-line args and sensible defaults • Extend functionalities of some tools, or provide workarounds to bugs for which the fix is not in yet in CMSSW’s build of RooStats • Provide other common infrastructure, e.g. • Running toys to get expected limits • Saving results to root files
Zoo of statistical methods • Asymptotic limits and significances from profiled likelihood • Bayesian limits • Various flavours of Hybrid Frequentist-Bayesian limits (“CLs”, ...) and significances • Feldman-Cousins bands • Computing both observed limits and expected limits with 68%/95% bands by running toys.
Other tools Other pieces: • Tools to combine datacards • Tools to run the combination across the GRID using CRAB (the CMS grid wrapper). • Plotting tools: basic functionality exists, but something better is under work
What’s higgs specific? • Datacard format: generic “excess of signal on top of some background” problem. • Statistical tools even more generic than this. • Main limitations: • Most things work only when setting a limit in a single parameter of interest • Parameter should behave like a cross section (positive definite; zero = no signal) • Limitations driven by lack of need for something even more general; can be overcome if needed.
CMS dependencies • text2workspace: • AsymPow for asymmetric log-normals • VerticalInterpPdf for vertical morphing • combination tool: • hypo test inversion with new HybridCalculator • optimized TestStatistics classes • some helper functions e.g. to factorize pdfs
External dependencies • text2workspace: • python libraries (re, optparse) + PyROOT • combination tool: • c++ std library (io, stl, auto_ptr, exceptions) • boost::program_options used to parse command-line args and configure methods • boost::filesystem to do a “rm –r” of demporary dir • some system-level stuff (e.g. fork, tempnam)