90 likes | 111 Views
Explore difficulties in expressing ideas clearly in HEP tools, wishlist for statistical tools, and a current statistical problem example at PhyStat Workshop 2004.
E N D
Statistical ToolsA Few Comments Harrison B. Prosper Florida State University PHYSTAT Workshop 2004 1-2 March 2004 Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Outline • Issues • Wish List • Example • Summary Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Statistical Tools: Issues • Some difficulties with tools used in HEP • Difficult to express ideas cleanly and clearly • Tools scattered over different (typically, monolithic) programs • Interface between heterogeneous data formats and disparate tools is a headache • Histograms are tightly coupled to their viewers • Algebra of histograms relatively crude • Inadequate support for systematic study of ensembles Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Issues – II • In a systematic statistical study one may wish to: • Generate different ensembles of observations, possibly with conditioning, and study various statistical properties (bias, variance, coverage etc.) • Assess robustness with respect to • prior densities and likelihoods • Study different confidence limit procedures • Study different optimization criteria Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Issues – III • One may wish to study: • Type I and type II error rates • Consistency – both convergence to, and rate of convergence to, the true answer as sample size increases • Probability densities p(z) given underlying distributions p(x) Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Wish List • Decoupling • Statistical tool separate from, and independent of, the environment in which it might be used. • However, provide bindings for different environments/languages (R, Root, Python, Java, etc.) • Modularity • Each statistical tool encapsulates a single coherent statistical idea. Avoid monoliths. • Histograms • Histogram and histogram viewers independent of each other. (A sensible idea from Marc Paterno!) • Elegant algebra of histograms h = a*h1+b*h2/h3 etc. • Powerful, intuitive tools for multi-dim. data exploration Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Wish List – II • Likelihoods • Flexible method for reporting them; maybe as swarms of points generated via MCMC? • Frequency Methods • Flexible ensemble generator, which allows easily extracted sub-ensembles • Flexible query of ensembles (to get coverage, error rates, variances, bias etc.) • Bayesian Methods • Flexible robustness studies (prior family, likelihood family etc.) • Multi-dimensional integration (adaptive and Markov chain MC) Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Example: A Current Statistical Problem From DØ Single Top Group • Set limit on s(p+pbar → t + X) given an histogram for each of • 4 signal channels • tq(EC), tqb(EC), tq(CC), tqb(CC) • 4 background sources per signal channel • QCD, ttbar(l+jets), ttbar(ll), W+Jets • Some histograms are weighted, some unweighted • We would like to study different limit procedures, including Bayesian, and study their frequency properties. Currently using ad hoc and rather inflexible pieces of homegrown C++! Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper
Summary • The Good • Lots of statistical tools already exist • A lot more needed – opportunity for creativity! • The Bad • Use of current tools, however, often requires familiarity with several frameworks/languages • The Ugly • Lack of a simple, but powerful, language for expression of statistical ideas. Rapid “what if” analyses done with C++. This is crazy! I don’t want to think about pointers and de-referencing when I’m trying to think about mathematics. Statistical Tools PhyStat Workshop 2004 Harrison B. Prosper