400 likes | 456 Views
Probability and Statistics. What is probability? What is statistics?. Probability and Statistics. Probability Formally defined using a set of axioms Seeks to determine the likelihood that a given event or observation or measurement will or has happened
E N D
Probability and Statistics • What is probability? • What is statistics?
Probability and Statistics • Probability • Formally defined using a set of axioms • Seeks to determine the likelihood that a given event or observation or measurement will or has happened • What is the probability of throwing a 7 using two dice? • Statistics • Used to analyze the frequency of past events • Uses a given sample of data to assess a probabilistic model’s validity or determine values of its parameters • After observing several throws of two dice, can I determine whether or not they are loaded • Also depends on what we mean by probability
Probability and Statistics • We perform an experiment to collect a number of top quarks • How do we extract the best value for its mass? • What is the uncertainty of our best value? • Is our experiment internally consistent? • Is this value consistent with a given theory, which itself may contain uncertainties? • Is this value consistent with other measurements of the top quark mass?
Probability and Statistics • CDF “discovery” announced 4/11/2011
Probability and Statistics • Pentaquark search - how can this occur? • 2003 – 6.8s effect 2005 – no effect
Probability • Let the sample space S be the space of all possible outcomes of an experiment • Let x be a possible outcome • Then P(x found in [x,x+dx]) = f(x)dx • f(x) is called the probability density function (pdf) • It may be called f(x;q) since the pdf could depend on one or more parameters q • Often we will want to determine q from a set of measurements • Of course x must be somewhere so
Probability • Definitions of mean and variance are given in terms of expectation values
Probability • Definitions of covariance and correlation coefficient
Probability • Error propagation
Probability • This gives the familiar error propagation formulas for sums (or differences) and products (or ratio)
Uniform Distribution • Let • What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x?
Binomial Distribution • Consider N independent experiments (Bernoulli trials) • Let the outcome of each be pass or fail • Let the probability of pass = p
Permutations • Quick review
Binomial Distribution • For the mean and variance we obtain (using small tricks) • And note with the binomial theorem that
Binomial Distribution • Binomial pdf
Binomial Distribution • Examples • Coin flip (p=1/2) • Dice throw (p=1/6) • Branching ratio of nuclear and particle decays (p=Br) • Detector or trigger efficiencies (pass or not pass) • Blood group B or not blood group B
Binomial Distribution • It’s baseball season! What is the probability of a 0.300 hitter getting 4 hits in one game?
Poisson Distribution • Consider when
Poisson Distribution • Poisson pdf
Poisson Distribution • Examples • Particles detected from radioactive decays • Sum of two Poisson processes is a Poisson process • Particles detected from scattering of a beam on target with cross section s • Cosmic rays observed in a time interval t • Number of entries in a histogram bin when data is accumulated over a fixed time interval • Number of Prussian soldiers kicked to death by horses • Infant mortality • QC/failure rate predictions
Poisson Distribution • Let
Gaussian Distribution • Gaussian distribution • Important because of the central limit theorem • For n independent variables x1,x2,…,xN that are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N • Examples are almost any measurement error (energy resolution, position resolution, …)
Gaussian Distribution • The familiar Gaussian pdf is
Gaussian Distribution • Some useful properties of the Gaussian distribution are • P(x in range m±s) = 0.683 • P(x in range m±2s) = 0.9555 • P(x in range m±3s) = 0.9973 • P(x outside range m±3s) = 0.0027 • P(x outside range m±5s) = 5.7x10-7 • P(x in range m±0.6745s) = 0.5
c2 Distribution • Chi-square distribution
Probability • Probability can be defined in terms of Kolmogorov axioms • The probability is a real-valued function defined on subsets A,B,… in sample space S • This means the probability is a measure in which the measure of the entire sample space is 1
Probability • We further define the conditional probability P(A|B) read P(A) given B • Bayes’ theorem
Probability • For disjoint Ai • Usually one treats the Ai as outcomes of a repeatable experiment
Probability • Usually one treats the Ai as outcomes of a repeatable experiment • Then P(A) is usually assigned a value equal to the limiting frequency of occurrence of A • Called frequentist statistics • But Ai could also be interpreted as hypotheses, each of which is true or false • Then P(A) represents the degree of belief that hypothesis A is true • Called Bayesian statistics
Bayes’ Theorem • Suppose in the general population • P(disease) = 0.001 • P(no disease) = 0.999 • Suppose there is a test to check for the disease • P(+, disease) = 0.98 • P(-, disease) = 0.02 • But also • P(+, no disease) = 0.03 • P(-, no disease) = 0.97 • You are tested for the disease and it comes back +. Should you be worried?
Bayes’ Theorem • Apply Bayes’ theorem • 3.2% of people testing positive have the disease • Your degree of belief about having the disease is 3.2%
Bayes’ Theorem • Is athlete A guilty of drug doping? • Assume a population of athletes in this sport • P(drug) = 0.005 • P(no drug) = 0.995 • Suppose there is a test to check for the drug • P(+, drug) = 0.99 • P(-, drug) = 0.01 • But also • P(+, no drug) = 0.004 • P(-, no drug) = 0.996 • The athlete is tested positive. Is he/she involved in drug doping?
Bayes’ Theorem • Apply Bayes’ theorem • ???
Binomial Distribution • Calculating efficiencies • Usually use e instead of p
Binomial Distribution • But there is a problem • If n=0, d(e’) = 0 • If n=N, d(e’) = 0 • Actually we went wrong in assuming the best estimate for e is n/N • We should really have used the most probable value of e given n and N • A proper treatment uses Bayes’ theorem but lucky for us (in HEP) the solution is implemented in ROOT • h_num->Sumw2() • h_den->Sumw2() • h_eff->Divide(h_num,h_den,1.0,1.0,”B”)