400 likes | 416 Views
Results from small numbers. Roger Barlow Manchester Bohr Lunch 16 th October 2009. The basic equation. Equation in use. With errors. Example: See 100 events, expected background 5.5. A=.5, η=.1, L = 10 fb -1 σ=189±20 fb Systematic errors from uncertainties in A, η, L, B.
E N D
Results from small numbers Roger Barlow Manchester Bohr Lunch 16th October 2009
The basic equation Roger Barlow
Equation in use Roger Barlow
With errors Example: See 100 events, expected background 5.5. A=.5, η=.1, L= 10 fb-1 σ=189±20 fb Systematic errors from uncertainties in A, η, L, B Roger Barlow
Why this talk? This treatment assumes N>>0 and N>>B In real life, one or other or both of these may not be true. Particularly if you are looking for a signal that is not there. N=1, B=0.0 quote 1 ± 1 ? N=100, B=101.0 quote -1 ± 10 ? N=0, B=1.1 quote -1.1 ± 0 ???????? Roger Barlow
Step back: Definition of Probability Ensemble of Everything A Limit of frequency P(A)= Limit N N(A)/N P(A) depends not just on A but on the ensemble – which must be specified. Sometimes this is not possible Roger Barlow
Events with no ensemble Some events are unique. Consider “It will probably rain tomorrow.” There is only one tomorrow (Saturday 17th October). There is NO ensemble. P(rain) is either 0/1 =0 or 1/1 = 1 Strict frequentists cannot say 'It will probably rain tomorrow'. This presents severe social problems. Roger Barlow
Circumventing the limitation Assemble an ensemble of statements of which 70% (say) are true. (E.g. Weather forecasts with a verified track record) A frequentist can say: “The statement ‘It will rain tomorrow’ has a 70% probability of being true.” It will rain tomorrow - with 70% confidence For unique events, confidence level statements replace probability statements. Roger Barlow
Measurement and Frequentist probability MT=1712 GeV Is there a 68% probability that MT lies between 169 and 173 GeV? No. MT is unique. It is either in the range or outside. (Soon we’ll know.) But m 2 does bracket x 68% of the time: The statement ‘MT lies between 169 and 173 GeV’ has a 68% probability of being true. MT lies between 169 and 173 GeV with 68% confidence Roger Barlow
Frequentist confidence intervals beyond the simple Gaussian Select Confidence Level value CL and strategy From P(x ,) choose construction (functions x1(), x2()) for which P(x[x1(), x2()]) CL for all Given a measurement X, make statement [LO, HI] @ CL Where X=x2(LO), X=x1(HI) (Neyman technique) Roger Barlow
Confidence Belt Constructed horizontally such that the probability of a result lying inside the belt is 68%(or whatever) Read vertically using the measurement m Example: proportional Gaussian = 0.1 Measures with 10% accuracy Result (say) 100.0 LO=90.91 HI= 111.1 x X Roger Barlow
Strategies 2 sided central. (1-CL)/2 above and below Upper Limit. 1-CL below Lower Limit. 1-CL above Roger Barlow
Small number counting experiments Poisson distribution P(r; )= e-r / r! For large can use Gaussian approx. But not small Choose CL. Just use one curve to give upper limit Discrete observable makes smooth curves into ugly staircases Observe n. Quote upper limit as HI from solving 0n P(r, HI) = 0n e-HI HIr/r! = 1-CL Translation. n is small. can’t be very large. If the true value is HI (or higher) then the chance of a result this small (or smaller) is only (1-CL) (or less) Roger Barlow
Poisson Table Upper limits n 90% 95% 99% 0 2.30 3.00 4.61 1 3.89 4.74 6.64 2 5.32 6.30 8.41 3 6.68 7.75 10.05 4 7.99 9.15 11.60 5 9.27 10.51 13.11 ..... Notice! If you see nothing, you have set a limit of 3.0 @ 95% CL Roger Barlow
Pause to collect thoughts • N>>0 requirement cracked • N>>B requirement not solved • Suppose you see 0 events and have an expected background of • B=0.1 – report limit 2.9 ? • B=2.9 – report limit 0.1 ?? • B=3.1 – report limit -0.1 ???? Roger Barlow
Alternative definition:Bayesian (Subjective) Probability P(A) is a number describing my degree of belief in A 1=certain belief. 0=total disbelief Can be calibrated against simple classical probabilities. P(A)=0.5 means: I would be indifferent given the choice of betting on A or betting on a coin toss. A can be anything: death, rain, horse races, existence of SUSY… Very adaptable. But no guarantee my P(A) is the same as your P(A). Subjective = unscientific? Roger Barlow
Bayes’ Theorem General (uncontroversial) form P(A|B)P(B) = P(A & B) = P(B|A) P(A ) P(A|B)=P(B|A) P(A) P(B) P(B) can be written P(B|A) P(A) + P(B|not A) (1-P(A)) Examples: People P(Artist|Beard)=P(Beard|Artist) P(Artist) P(Beard) /K Cherenkov counter P(|signal)=P(signal| ) P() P(signal) Medical diagnosis P(disease|symptom)=P(symptom|disease) P(disease) P(symptom) 0.9*0.5/(.9*.5+.01*.5)= 0.989 Roger Barlow
Bayes’ Theorem “Bayesian” form P(Theory|Data)=P(Data|Theory) P(Theory) P(Data) “Theory” may be an event (e.g. rain tomorrow) Or a parameter value (e.g. Higgs Mass): then P(Theory) is a function P(MH|Data)=P(Data|MH) P(MH) P(Data) Prior Posterior Roger Barlow
= X Measurements:Bayes at work Result value x Theoretical ‘true’ value P(|x) P(x|) P() Prior is generally taken as uniform Ignore normalisation problems Construct theory of measurements – prior of second measurement is posterior of the first P(x|) is often Gaussian, but can be anything (Poisson, etc) For Gaussian measurement and uniform prior, get Gaussian posterior Roger Barlow
Aside: “Objective” Bayesian statistics • Attempt to lay down rule for choice of prior • ‘Uniform’ is not enough. Uniform in what? • Suggestion (Jeffreys): uniform in a variable for which the expected Fisher information <d2ln L/dx2>is minimum (statisticians call this a ‘flat prior’). • Has not met with general agreement – different measurements of the same quantity have different objective priors Roger Barlow
Pause for breath For Gaussian measurements of quantities with no constraints/objective prior knowledge the same results are given by: • Frequentist confidence intervals • Bayesian posteriors from uniform priors A frequentist and a simple Bayesian will report the same outcome from the same raw data, except one will say ‘confidence’ and the other ‘probability’. Roger Barlow
Bayesian limits from small number counts P(r,)=exp(- ) r/r! With uniform prior this gives posterior for Shown for various small r results Read off intervals... P() r=0 r=1 m r=2 r=6 Roger Barlow
Upper limits Upper limit from n events 0HI exp(- ) n/n! d = CL Repeated integration by parts: 0n exp(- HI) HIr/r! = 1-CL Same as frequentist limit This is a coincidence! Lower Limit formula is not the same Roger Barlow
Result depends on Prior Example: 90% CL Limit from 0 events Prior flat in m Prior flat in m 2.30 X = = X 1.65 Roger Barlow
Which is right? • Bayesian Method is generally easier, conceptually and in practice • Frequentist method is truly objective. Bayesian probability is personal ‘degree of belief’. This does not worry biologists but should worry physicists. • Ambiguity appears in Bayesian results as differences in prior give different answers, though with enough data these differences vanish • Check for ‘robustnesss under change of prior’ is standard statistical technique, generally ignored by physicists • ‘Uniform priors’ is not a sufficient answer. Uniform in what? Roger Barlow
Problems for Frequentists:Add a background m=S+b Frequentist method (b known, m measured, S wanted) • Find range for m • Subtract b to get range for S Examples: See 5 events, background 1.2 95% Upper limit: 10.5 9.3 See 5 events, background 5.1 95% Upper limit: 10.5 5.4 ? See 5 events, background 10.6 95% Upper limit: 10.5 -0.1 Roger Barlow
S< -0.1? What’s going on? If N<b we know that there is a downward fluctuation in the background. (Which happens…) But there is no way of incorporating this information without messing up the ensemble Really strict frequentist procedure is to go ahead and publish. • We know that 5% of 95%CL statements are wrong – this is one of them • Suppressing this publication will bias the global results Roger Barlow
Similar problems • Expected number of events must be non-negative • Mass of an object must be non-negative • Mass-squared of an object must be non-negative • Higgs mass from EW fits must be bigger than LEP2 limit of 114 GeV 3 Solutions • Publish a ‘clearly crazy’ result • Use Feldman-Cousins technique • Switch to Bayesian analysis Roger Barlow
m=S+b for Bayesians • No problem! • Prior for m is uniform for Sb • Multiply and normalise as before Posterior Likelihood Prior Read off Confidence Levels by integrating posterior X = Roger Barlow
A Frequentist Solution:Feldman Cousins MethodWorks by attacking what looks like a different problem... * by Feldman and Cousins, mostly Roger Barlow
Feldman Cousins: m=s+bb is known. N is measured. s is what we're after This is called 'flip-flopping' and BAD because is wrecks the whole design of the Confidence Belt Suggested solution: 1) Construct belts at chosen CL as before 2) Find new ranking strategy to determine what's inside and what's outside 1 sided 90% 2 sided 90% Roger Barlow
Feldman Cousins: Ranking First idea (almost right) Sum/integrate over range of N values with highest probabilities for this s and b. (advantage that this is the shortest interval) Glitch: Suppose N small. (low fluctuation) P(N;s+b) will be small for any s and never get counted s N Roger Barlow
How it works Instead: compare to 'best' probability for this N, at s=N-b or s=0, and rank on that number Has to be computed for the appropriate value of background b. (Sounds complicated, but there is lots of software around) As n increases, flips from 1-sided to 2-sided limits – and the probability of being in the belt is preserved s N Means that sensible 1-sided limits are quoted instead of nonsensical 2-sided limits! Roger Barlow
Arguments against using Feldman Cousins • Argument 1 It takes control out of hands of physicist. You might want to quote a 2 sided limit for an expected process, an upper limit for something weird • Counter argument: This is the virtue of the method. This control invalidates the conventional technique. In rare cases it is permissible to say ”We set a 2 sided limit, but we're not claiming a signal” Roger Barlow
Example: you reward a good student with a lottery ticket which has a 10% chance of winning $10. A moderate student gets a ticket with a 1% chance of winning $20. They both win. Were you unfair? Feldman Cousins: Argument 2 • Argument 2 If zero events are observed by two experiments, the one with the higher background b will quote the lower limit. This is unfair to hardworking physicists • Counterargument An experiment with higher background has to be ‘lucky’ to get zero events. Luckier experiments will always quote better limits. Averaging over luck, lower values of b get lower limits to report. Example: you reward a good student with a lottery ticket which has a 10% chance of winning £1000. A moderate student gets a ticket with a 1% chance of winning £2000. They both win. Were you unfair? Roger Barlow
Including Systematic Errors m=aS+b m is predicted number of events S is (unknown) signal source strength. Probably a cross section or branching ratio or decay rate a is an acceptance/luminosity factor known with some (systematic) error b is the background rate, known with some (systematic) error Roger Barlow
Full Bayesian Assume priors • for S (uniform?) • For a (Gaussian?) • For b (Poisson or Gaussian?) Write down the posterior P(S,a,b). Integrate over all a,b to get marginalised P(s) Read off desired limits by integration Roger Barlow
Hybrid Bayesian Assume priors • For a (Gaussian?) • For b (Poisson or Gaussian?) Integrate over all a,b to get marginalised P(r,S) Read off desired limits by 0nP(r,S) =1-CL etc Done approximately for small errors (Cousins and Highland). Shows that limits pretty insensitive to a ,b Numerically for general errors (RB: java applet on SLAC web page). Includes 3 priors (for a) that give slightly different results Roger Barlow
Etcetera… • Extend Feldman Cousins • Profile Likelihood: Use P(S)=P(n,S,amax,bmax) where amax,bmax give maximum for this S,n • Empirical Bayes • And more… Roger Barlow
Summary • Straight Frequentist approach is objective and clean but sometimes gives ‘crazy’ results • Bayesian approach useful but has problems. Check for robustness under choice of prior • Feldman-Cousins deserves more widespread adoption • Lots of work still going on • This will all be needed at the LHC But always know what you are doing and say what you are doing. Roger Barlow