220 likes | 406 Views
RooFit & RooStats tools for data modeling and statistical analysis in ROOT. Wouter Verkerke (NIKHEF). Overview of this talk. Talk overview Recently added RooFit features The RooStats project Current release cycle
E N D
RooFit & RooStatstools for data modeling and statistical analysis in ROOT Wouter Verkerke (NIKHEF) Wouter Verkerke, NIKHEF
Overview of this talk • Talk overview • Recently added RooFit features • The RooStats project • Current release cycle • Have started major new RooFit development cycle in ROOT development release 5.17, RooFit 2.23 • Stable version to be delivered in ROOT production 5.18/00. Release date Dec 12, RooFit v2.30. Deadline for last code tomorrow. Wouter Verkerke, NIKHEF
New features – Core engineering • Core engineering – Complete rewrite of optimization algorithms for optimization of likelihood calculations • Recent versions of classes like RooAddPdf and RooProdPdf extensive use caching of composite function objects that represent partial results for given integration/normalization configurations. Cache objects created is usually deferred till first use and multiple configurations are handled simultaneously • Old optimization code not equipped to handle optimization and client/server link reconnection of cached objects well • New support class RooObjCacheManager takes transparently care of all caching and optimization logic for cached function objects • Many specialized hooks and support functions to work around limitations of old code have now disappeared Code is much cleaner and more maintainable for future • Can in principle do more optimizations than before but improved robustness in handling certain conditions adds some overhead. Speed is expected to be within ~5% of original RooFit with fluctuations depending on application • Next version of RooFit will have significant speedups of (complex) plot projections as new optimization engine can also be applied to plot projections (works in principle, but not enabled yet) Wouter Verkerke, NIKHEF
New features – RooMsgService • All RooFit messaging now routed through new RooMsgService interface • New service has interface that allows detailed control over what messages are printed. Can filter on • Message severity (DEBUG, INFO, WARNING, ERROR, FATAL) • Message topic (Plotting, Integration, Generation, …) • Originating object class (RooGaussian etc…) • Originating object name (“MySignalPdf” etc…) • Tags applied to object (arg->setLabel(“DebugMeLabel”)) • Control through RooMsgService::instance() • Default configuration root [0] RooMsgService::instance().Print("v") All Message streams [0] MinLevel = WARNING Topic = Any [1] MinLevel = INFO Topic = Generation Minization Plotting Fitting \\ Caching Optimization Wouter Verkerke, NIKHEF
New features – RooMsgService • Add new streams as you like, i.e. • RooMsgService::instance().addStream(kINFO, Topic(kIntegration),ObjectName("MyPdf")) • A lot of new INFO level messages have been added the topics of Integration, Generation • Explain how RooFit arrives at its decision to perform integration, generation etc… • Note • Adding streams with object-specific message may affect performance. Mostly intended for your debugging convenience • Adding any stream with DEBUG level messages, even not object specific affect performance significantly. Again, these exist for your debugging convenience. • Disabling all message streams will make RooFit completely silent (in case you care…) Wouter Verkerke, NIKHEF
New features – GraphViz support • You can draw graphs of RooFit object trees of arbitrary complexity using the OpenSouce GraphViz tools for graph visualization • ROOT> pdf->graphVizTree(“pdf.dot”) • UNIX> dot –Tps –o pdf.ps pdf.dot (directed graph algorithm)UNIX> fdp –Tps –o pdf.ps pdf.dot (spring model algorithm) ‘dot’ ‘fdp’ Wouter Verkerke, NIKHEF
New features – RooClassFactory • Code factory for RooFit classes, writes skeleton class for RooAbsPdf, RooAbsReal • Example that writes function ready to be compiledRooClassFactory::makeFunction("RooDilution", // class name "w,w_p0,w_p1", // name of variables "1-2*(w_p0+(1-w_p1)*w)") ; // function expression.L RooDilution.cxx+ // load class • Can also immediately instantiate code RooAbsReal* f = RooClassFactory::defineFunction("f", "D(1-2w)",RooArgSet(D,w)) • Returns function to dedicated compiled function object • Fast replacement of RooFormulaVar • Many more options • Can also specify optional analytical integrals in extra argument • Can also create functions with RooCategory arguments Wouter Verkerke, NIKHEF
New features – Modular extension of RooMCStudy • New version of RooMCStudy has hooks to insert chain of modules in study that allow to intervene before and after each generation and fit step to customize behavior • Two standard modules provided: • RooDLLSignificanceMCSModule calculates significance with delta (-log(L)) method in given parameter. Result is added to RooDataSet with output • RooRandomizeParamsMCSModule randomize generation value of given parameter before each generation (uniform or Gaussian) • Abstract base class for modules allows to write your own • Example use RooDLLSignificanceMCSModule sigModule(*nsig,0) ;RooRandomizeParamMCSModule randModule ;randModule.sampleSumUniform(param,loVal,HighVal) ;RooMCStudy mcs(*model,*mjjj) ;mcs.addModule(sigModule) ;mcs.addModule(randModule) ; Wouter Verkerke, NIKHEF
New operator PDFs – Numeric convolution through FFT • New generic convolutions operator PDF RooFTTConvPdf that can numerically convolve any two p.d.f.s using FFT techniques • Use (free) FFTW3 fourier transform engine (www.fftw.org) • Must build ROOT with –enable-fftw • Example code • Amazing speed and precision, ~100x faster than RooNumConvPdf, few num. stability issues • Unbinned ML fit of Bmix (x) Gauss to 20000 events with dm,tau,D floating = 30 seconds (=about same as analytical calculation) • Performance will drop if per-event errors are used as FFT calculate precalculates p.d.f in one operation for all observable values. Efficient when p.d.f is evaluated at many points for one set of parameters, not efficient when p.d.f is only evaluated once. • Future versions will support >1 convolution as well RooRealVar x("x","x",-10,20) ; x.setBins(1000) ; // Binning controls FFT sampling density. Use at least 1000 for good precision RooGaussian gx("gx","gx",x,mx,sx) ; RooLandau lx("lx","lx",x,ml,sl) ; RooFFTConvPdf gxlx("gxbx","gx (X) bx",x,gx,lx) ; Wouter Verkerke, NIKHEF
New pdfs – Generic n-Dim KEYS p.d.f • Designed as replacement of Roo2DKeysPdf • Written by Max Baak for ATLAS higgs analysis • NB Several bugs were discovered in Roo2DKeysPdf • Works in any number of dimensions. • Takes correlations of input data into account in shape of kernel • Implementation has optimizations for speed (work best at higher dimensions) • Analytical integration and analytical partial integrals Projection with partial analytical integral Wouter Verkerke, NIKHEF
Other miscellaneous new features • New version of class RooProduct (product of any number of RooAbsReal objects) • Support for factorizing (analytical) integration of product expression analoguous to RooProdPdf • Provided by Gerhard Raven • New class RooProfileLL that represents the profile likelihood for a given likelihood • Example given a p.d.f F with parameters p1,p2,p3. Construction of likelihood (= function of p1, p2,p3) RooNLLVar nll("nll","nll",px,*d) ; • Construction of profile likelihood in p1 (=likelihood minimized w.r.t all parameters except p1)RooProfileLL pnll1("pnll","profile ll",nll,p1) ; • Expensive function (MINUIT is called for every evaluation) • Plotting / scanning of profile likelihood will give correct error estimate on p1 Wouter Verkerke, NIKHEF
New concept – RooWorkspace • One of the main missing features in RooFit is a tool to organize complex projects • A container for composite p.d.f objects, multiple datasets • New class RooWorkspace provides basic infrastructure for complex project management • Container class for p.d.fs, datasets, functions etc… • Controlled interface: cannot insert duplicates with same name. • Automatic reconnects: if a pdf f(x,p) is inserted and an internal RooRealVar x already exists, the copy that is inserted is automatically connected to the copy in the workspace • Tools for conflict resolution on insertion: Can rename nodes on the fly upon inserted: RooWorkspace::import(pdf,RenameConflictNodes(“_v2”)) ; • Tools for variable renaming on insertionRooWorkspace::import(pdf,RenameVariable(“x”,”y”)) ; Wouter Verkerke, NIKHEF
New concept – RooWorkspace • New RooWorkspace can be persisted entirely • Allows to save p.d.fs in addition to data • Important new concept • Sharing data is between individual physicists, working groups, or experiments is relatively easy – ROOT TTrees, THx histograms almost universal standard • Sharing functions (likelihood / probability density) generally much more difficult due to lack of common language • RooFit makes sharing (probability density) functions very easy: functions can be persisted in ROOT files (NEW) • Many potential benefits • Easy sharing of results, ideas • Simplifies cross checks, debugging and result combinations • Combined fits for CP parameters easily executed by combining likelihood from multiple workspaces Wouter Verkerke, NIKHEF
Persistence of models • Elementary use case • Both data and p.d.f. are now stored in file! • Works for p.d.f.s of arbitrary complexity, e.g. complicated fit with multiple side bands, full Higgs combination RooAbsPdf& g ; // any p.d.f you made RooAbsData& d ; // any data you made RooWorkspace w(“w”,”my workspace”) ; w.import(g) ; // import p.d.f w.import(d) ; // import data TFile f(“myresult.root”,”RECREATE”) ; w.Write() ; f.Close() ; Create the workspacecontainer object Use standardROOT I/Oto store wspace Wouter Verkerke, NIKHEF
A look at the workspace • What is in the workspace? w.Print() ; RooWorkspace(w) my workspace contents variables --------- (x,m,s) p.d.f.s ------- RooGaussian::g[ x=x mean=m sigma=s ] = 0 datasets -------- RooDataSet::d(x) Typed accessorsto convenientlyretrieve contents RooRealVar* x = w.var(“x”) ; RooAbsPdf* g = w.pdf(“g”) ; RooAbsData* d = w.data(“d”) ; Wouter Verkerke, NIKHEF
Using & adapting persisted p.d.f.s. • Using both model & p.d.f from file TFile f(“myresults.root”) ; RooWorkspace* w = f.Get(“w”) ; RooPlot* xframe = w->var(“x”)->frame() ; w->data(“d”)->plotOn(xframe) ; w->pdf(“g”)->plotOn(xframe) ; // p.d.f.s in workspace work with any data w->pdf(“g”)->fitTo(*myData) ; // Naming conflicts or mismatches easily // resolved by importing all objects in wspace w->import(*myData,RenameVariable(“y”,”x”)) ; Make plotof dataand p.d.f Fit p.d.f other dataoutsideworkspace Alternativelyimport datain workspace Wouter Verkerke, NIKHEF
A more complex example • Combining toy ‘ATLAS’ and ‘CMS’ results from persisted workspaces TFile* f = new TFile("atlas.root") ; RooWorkspace *atlas = f->Get("atlas") ; TFile* f = new TFile("cms.root") ; RooWorkspace *cms = f->Get("cms") ; RooAddition nllCombi("nllCombi","nll CMS&ATLAS", RooArgSet(*cms->function(“nll”),*atlas->function(“nll”))) ; RooProfileLL pllCombi("pllCombi","pll",nllCombi,*atlas->var("mHiggs")) ; RooPlot* mframe = atlas->var("mHiggs")->frame(-3.5,-2.5) ; atlas->function(“nll”)->plotOn(mframe)) ; cms->function(“nll”)->plotOn(mframe),LineStyle(kDashed)) ; pllCombi.plotOn(mframe,LineColor(kRed)) ; mframe->Draw() ; // result on next slide Read ATLASworkspace Read CMSworkspace Constructcombined LH Constructprofile LHin mHiggs PlotAtlas,CMS,combinedprofile LH NB: You can publish your actual likelihood in digital form in this way Wouter Verkerke, NIKHEF
ROOT, RooFit & RooStats RooStats RooFit is extension to ROOT – (Almost) no overlap with existing functionality Statistical analysis Neyman constructionBayesian posteriorProfile Likelihood Statistical analysis Neyman constructionBayesian posteriorProfile Likelihood Data Modeling ToyMC dataGeneration Model Visualization Data/Model Fitting MINUIT C++ command line interface & macros Data management & histogramming I/O support Graphics interface Wouter Verkerke, NIKHEF
The RooStats project – common statistics tools for LHC • Initiative by Rene & Kyle to organize suite of common tools in ROOT • Propose to build tools on top RooFit following survey of existing software and user community • Idea to have few core developers maintaining the framework and have mechanism for users/collaborations to contribute concrete tools • Necessary groundwork in RooFit for support of RooStats mostly done • What should be in there? • There are few major classes of statistical techniques: • Likelihood: All inference from likelihood curves • Bayesian: Use prior on parameter to compute P(theory|data) • Frequentist: Restricted to statements of P(data|theory) • Even within one of these classes, there are several ways to approach the same problem. • Aim to collect them all in one set of consistent tools Wouter Verkerke, NIKHEF
Designing the framework • Kyle & I met early 2007 to discuss how to implement a few statistical concepts on top of RooFit • want class structure to maps onto statistical concepts • Successfully worked out a few of the methods • The first examples were • Bayesian Posterior • Profile likelihood ratio • Acceptance Regio • Ordering Rule • Neyman Construction • Confidence Interval • Many concepts already have an appropriate class in RooFit • New RooWorkspace class key component of interface Wouter Verkerke, NIKHEF
RooStats progress • Kyle has done several successful pilot studies to test out feasibility of concept • E.g. multi-channel Higgs sensitivity study • Now starting with construction of concrete tools • First candidate is real world Tevatron example with input from Tom Jun • Aiming for first functional release in course of 5.18 (spring 2008) Wouter Verkerke, NIKHEF
RooFit Developments & Future plans – Overview • Quite a bit of new code developed in 2007, with more to come in 2008. Will cover this later • Manpower • Interest for and use of RooFit in ATLAS, CMS, LHCb is increasing. • I continue to develop and support RooFit at ~10-20% level (which has been support level since 5 years). • I intend to continue at this level for the foreseeable future. • Access to code, bundling with ROOT • Development copy of RooFit moved from SourceForge to ROOT SubVersion repository. Simplifies updates to ROOT • ROOT SubVersion allows me to easily make development branches • Intend to make use of more ROOT/CERN facilities for support • File your bug requests in the ROOT Savannah tracker • Ask your question on the ROOT forums. Wouter Verkerke, NIKHEF