400 likes | 463 Views
Delve into the fundamental concepts of probability and statistics, from axioms to practical applications. Understand how to analyze data, determine likelihoods, and extract valuable insights. Explore various distributions, including Gaussian and Poisson, along with key statistical techniques like error propagation and conditional probability.
E N D
Probability and Statistics • What is probability? • What is statistics?
Probability and Statistics • Probability • Formally defined using a set of axioms • Seeks to determine the likelihood that a given event or observation or measurement will or has happened • What is the probability of throwing a 7 using two dice? • Statistics • Used to analyze the frequency of past events • Uses a given sample of data to assess a probabilistic model’s validity or determine values of its parameters • After observing several throws of two dice, can I determine whether or not they are loaded • Also depends on what we mean by probability
Probability and Statistics • We perform an experiment to collect a number of top quarks • How do we extract the best value for its mass? • What is the uncertainty of our best value? • Is our experiment internally consistent? • Is this value consistent with a given theory, which itself may contain uncertainties? • Is this value consistent with other measurements of the top quark mass?
Probability and Statistics • CDF “discovery” announced 4/11/2011
Probability and Statistics • Pentaquark search - how can this occur? • 2003 – 6.8s effect 2005 – no effect
Probability • Let the sample space S be the space of all possible outcomes of an experiment • Let x be a possible outcome • Then P(x found in [x,x+dx]) = f(x)dx • f(x) is called the probability density function (pdf) • It may be called f(x;q) since the pdf could depend on one or more parameters q • Often we will want to determine q from a set of measurements • Of course x must be somewhere so
Probability • Definitions of mean and variance are given in terms of expectation values
Probability • Definitions of covariance and correlation coefficient
Probability • Error propagation
Probability • This gives the familiar error propagation formulas for sums (or differences) and products (or ratio)
Uniform Distribution • Let • What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x?
Binomial Distribution • Consider N independent experiments (Bernoulli trials) • Let the outcome of each be pass or fail • Let the probability of pass = p
Permutations • Quick review
Binomial Distribution • For the mean and variance we obtain (using small tricks) • And note with the binomial theorem that
Binomial Distribution • Binomial pdf
Binomial Distribution • Examples • Coin flip (p=1/2) • Dice throw (p=1/6) • Branching ratio of nuclear and particle decays (p=Br) • Detector or trigger efficiencies (pass or not pass) • Blood group B or not blood group B
Binomial Distribution • It’s baseball season! What is the probability of a 0.300 hitter getting 4 hits in one game?
Poisson Distribution • Consider when
Poisson Distribution • Poisson pdf
Poisson Distribution • Examples • Particles detected from radioactive decays • Sum of two Poisson processes is a Poisson process • Particles detected from scattering of a beam on target with cross section s • Cosmic rays observed in a time interval t • Number of entries in a histogram bin when data is accumulated over a fixed time interval • Number of Prussian soldiers kicked to death by horses • Infant mortality • QC/failure rate predictions
Poisson Distribution • Let
Gaussian Distribution • Gaussian distribution • Important because of the central limit theorem • For n independent variables x1,x2,…,xN that are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N • Examples are almost any measurement error (energy resolution, position resolution, …)
Gaussian Distribution • The familiar Gaussian pdf is
Gaussian Distribution • Some useful properties of the Gaussian distribution are • P(x in range m±s) = 0.683 • P(x in range m±2s) = 0.9555 • P(x in range m±3s) = 0.9973 • P(x outside range m±3s) = 0.0027 • P(x outside range m±5s) = 5.7x10-7 • P(x in range m±0.6745s) = 0.5
c2 Distribution • Chi-square distribution
Probability • Probability can be defined in terms of Kolmogorov axioms • The probability is a real-valued function defined on subsets A,B,… in sample space S • This means the probability is a measure in which the measure of the entire sample space is 1
Probability • We further define the conditional probability P(A|B) read P(A) given B • Bayes’ theorem
Probability • For disjoint Ai • Usually one treats the Ai as outcomes of a repeatable experiment
Probability • Usually one treats the Ai as outcomes of a repeatable experiment • Then P(A) is usually assigned a value equal to the limiting frequency of occurrence of A • Called frequentist statistics • But Ai could also be interpreted as hypotheses, each of which is true or false • Then P(A) represents the degree of belief that hypothesis A is true • Called Bayesian statistics
Bayes’ Theorem • Suppose in the general population • P(disease) = 0.001 • P(no disease) = 0.999 • Suppose there is a test to check for the disease • P(+, disease) = 0.98 • P(-, disease) = 0.02 • But also • P(+, no disease) = 0.03 • P(-, no disease) = 0.97 • You are tested for the disease and it comes back +. Should you be worried?
Bayes’ Theorem • Apply Bayes’ theorem • 3.2% of people testing positive have the disease • Your degree of belief about having the disease is 3.2%
Bayes’ Theorem • Is athlete A guilty of drug doping? • Assume a population of athletes in this sport • P(drug) = 0.005 • P(no drug) = 0.995 • Suppose there is a test to check for the drug • P(+, drug) = 0.99 • P(-, drug) = 0.01 • But also • P(+, no drug) = 0.004 • P(-, no drug) = 0.996 • The athlete is tested positive. Is he/she involved in drug doping?
Bayes’ Theorem • Apply Bayes’ theorem • ???
Binomial Distribution • Calculating efficiencies • Usually use e instead of p
Binomial Distribution • But there is a problem • If n=0, d(e’) = 0 • If n=N, d(e’) = 0 • Actually we went wrong in assuming the best estimate for e is n/N • We should really have used the most probable value of e given n and N • A proper treatment uses Bayes’ theorem but lucky for us (in HEP) the solution is implemented in ROOT • h_num->Sumw2() • h_den->Sumw2() • h_eff->Divide(h_num,h_den,1.0,1.0,”B”)