Profile likelihood – the complete story

Profile likelihood – the complete story ATLAS Statistics Forum CERN, 23 March, 2010 Glen Cowan (RHUL) Eilam Gross, Ofer Vitells (Weizmann Institute) Kyle Cranmer (NYU) Profile likelihood -- the complete story

History For the 2008/9 CSC Higgs Combination exercise, the Statistics Forum together with the Higgs Group developed methods for using the Profile Likelihood Ratio to incorporate systematics. Details in CSC Higgs chapter (last section); also known as ATL-PHYS-PUB-2009-063. Some improvements to these methods for purposes of upper limits were suggested (N. Andari, L. Fayard et al., 8.7.09 StatForum) and studied (G. Cowan, E. Gross et al., 2.9.09 StatForum). Improvement achieved by allowing unphysical (negative) estimator for the strength parameter m. At 12.2.09 StatForum, O. Vitells and E. Gross showed how many of these results can be “derived” using an approximation due to Wald. Profile likelihood -- the complete story

Outline for today Purpose of today’s talk is to bring these ingredients together and show close-form expressions for the distributions of the test statistics and significances. Brief review of profile likelihood procedure Definition of test statistics for discovery, upper limits Distribution of the likelihood ratio (Wald) Distribution of the test statistics for discovery, limits Median significance and error bands Conclusions / recommendations Profile likelihood -- the complete story

Reminder of the method Carry out significance test of various hypotheses (background-only, signal plus background, …) Result is p-value. Exclude hypothesis if p-value below threshold: Discovery: test of background-only hypothesis. Exclude if p < 2.9 × 10-7 (i.e. Gaussian signif. Z = F-1(1-p) > 5) Limits: test signal (+background) hypothesis. Exclude if p < 0.05 (i.e. 95% CL limit) Profile likelihood -- the complete story

Prototype analysis Search for signal in a region of phase space; result is histogram of some variable x giving numbers: Assume the ni are Poisson distributed with expectation values strength parameter where background signal Profile likelihood -- the complete story

Prototype analysis (II) Often also have a subsidiary measurement that constrains some of the background and/or shape parameters: Assume the mi are Poisson distributed with expectation values (N.B. here m = number of counts, not mass!) nuisance parameters (qs, qb,btot) Likelihood function is Profile likelihood -- the complete story

The profile likelihood ratio Base significance test on the profile likelihood ratio: maximizes L for specified m maximize L The likelihood ratio gives optimum test between two point hypotheses (Neyman-Pearson lemma). Should be near-optimal in present analysis with variable m and nuisance parameters q. Profile likelihood -- the complete story

Test statistic for discovery Try to reject background-only (m = 0) hypothesis using i.e. only regard upward fluctuation of data as evidence against the background-only hypothesis. Large q0 means increasing incompatibility between the data and hypothesis, therefore p-value for an observed q0,obs is will get formula for this later Profile likelihood -- the complete story

Test statistic for upper limits For purposes of setting an upper limit on m use Note for purposes of setting an upper limit, one does not regard an upwards fluctuation of the data as representing incompatibility with the hypothesized m. But in contrast to the CSC Higgs combination, here we are letting the estimator for m go negative (à la Andari et al.). Profile likelihood -- the complete story

Alternative test statistic for upper limits Assume physical signal model has m > 0, therefore if estimator for m comes out negative, the closest physical model has m = 0. Therefore could also measure level of discrepancy between data and hypothesized m with This is in fact the test statistic used in the Higgs CSC combination. Performance not identical to but very close to qm (of previous slide). qm is in certain ways simpler (hence preferred). Profile likelihood -- the complete story

Wald approximation for profile likelihood ratio To find p-values, we need: For median significance under alternative, need: Use approximation due to Wald (1943) sample size Profile likelihood -- the complete story

Noncentral chi-square for -2lnl(m) If we can neglect the O(1/√N) term, -2lnl(m) follows a noncentral chi-square distribution with noncentrality parameter As a special case, if m′ = m then L = 0 and -2lnl(m) follows a chi-square distribution for one degree of freedom (Wilks). Profile likelihood -- the complete story

The Asimov data set To estimate median value of -2lnl(m), consider special data set where all statistical fluctuations suppressed and ni, mi are replaced by their expectation values (the “Asimov” data set): Asimov value of -2lnl(m) gives noncentrality param. L, or equivalently, s Profile likelihood -- the complete story

Relation between test statistics and Profile likelihood -- the complete story

Relation between test statistics and (II) ~ ˆ Similarly, qm and qm also have monotonic relation with m. And therefore quantiles of qm, qm can be obtained directly from those of m (which is Gaussian). ̃ ˆ Profile likelihood -- the complete story

Distribution of q0 Assuming the Wald approximation, we can write down the full distribution of q0 as The special case m′ = 0 is a “half chi-square” distribution: Profile likelihood -- the complete story

Cumulative distribution of q0, significance From the pdf, the cumulative distribution of q0 is found to be The special case m′ = 0 is The p-value of the m = 0 hypothesis is Therefore the discovery significance Z is simply Profile likelihood -- the complete story

Distribution of qm Similar results for qm Profile likelihood -- the complete story

̃ Distribution of qm Similar results for qm ̃ Profile likelihood -- the complete story

An example Profile likelihood -- the complete story

Error bands Profile likelihood -- the complete story

Error bands for median exclusion significance Median and lower error band same for qm and qm, upper error band differs slightly. ̃ median Profile likelihood -- the complete story

Comparison of power (exclusion limits) Profile likelihood -- the complete story

Conclusions Procedure for discovery and upper limits using profile likelihood ratio fully worked out. Systematics included via nuisance parameters In large sample limit, Wald approximation gives discovery and exclusion limits; also median significance and error bands In Wald approx., median = Asimov value In practice, “large sample limit” appears to hold well even for fairly small samples, but need to check with MC. For very low statistics analyses, still need to get sampling distribution of test variables via MC (not too hard for 95% CL exclusion) Profile likelihood -- the complete story

Profile likelihood – the complete story

Profile likelihood – the complete story

Presentation Transcript

Outline for Today

Outline for Today

Outline for Today

Outline for Today

Outline for today

Outline for today

Outline for today

Outline for today

Outline for Today

Outline for Today

Outline for Today

Outline for Today

Outline for today

Outline for today

Outline for Today

Outline for Today

Outline for Today

Outline for Today

Outline for today

Outline for Today

Outline for Today

Outline for today