Exploring Probability and Statistics: Concepts, Models, and Applications

Probability and Statistics • What is probability? • What is statistics?

Probability and Statistics • Probability • Formally defined using a set of axioms • Seeks to determine the likelihood that a given event or observation or measurement will or has happened • What is the probability of throwing a 7 using two dice? • Statistics • Used to analyze the frequency of past events • Uses a given sample of data to assess a probabilistic model’s validity or determine values of its parameters • After observing several throws of two dice, can I determine whether or not they are loaded • Also depends on what we mean by probability

Probability and Statistics • We perform an experiment to collect a number of top quarks • How do we extract the best value for its mass? • What is the uncertainty of our best value? • Is our experiment internally consistent? • Is this value consistent with a given theory, which itself may contain uncertainties? • Is this value consistent with other measurements of the top quark mass?

Probability and Statistics • CDF “discovery” announced 4/11/2011

Probability and Statistics

Probability and Statistics • Pentaquark search - how can this occur? • 2003 – 6.8s effect 2005 – no effect

Probability • Let the sample space S be the space of all possible outcomes of an experiment • Let x be a possible outcome • Then P(x found in [x,x+dx]) = f(x)dx • f(x) is called the probability density function (pdf) • It may be called f(x;q) since the pdf could depend on one or more parameters q • Often we will want to determine q from a set of measurements • Of course x must be somewhere so

Probability • Definitions of mean and variance are given in terms of expectation values

Probability • Definitions of covariance and correlation coefficient

Probability • Error propagation

Probability • This gives the familiar error propagation formulas for sums (or differences) and products (or ratio)

Uniform Distribution • Let • What is the position resolution of a silicon or multiwire proportional chamber with detection elements of space x?

Binomial Distribution • Consider N independent experiments (Bernoulli trials) • Let the outcome of each be pass or fail • Let the probability of pass = p

Permutations • Quick review

Binomial Distribution • For the mean and variance we obtain (using small tricks) • And note with the binomial theorem that

Binomial Distribution • Binomial pdf

Binomial Distribution • Examples • Coin flip (p=1/2) • Dice throw (p=1/6) • Branching ratio of nuclear and particle decays (p=Br) • Detector or trigger efficiencies (pass or not pass) • Blood group B or not blood group B

Binomial Distribution • It’s baseball season! What is the probability of a 0.300 hitter getting 4 hits in one game?

Poisson Distribution • Consider when

Poisson Distribution

Poisson Distribution • Poisson pdf

Poisson Distribution • Examples • Particles detected from radioactive decays • Sum of two Poisson processes is a Poisson process • Particles detected from scattering of a beam on target with cross section s • Cosmic rays observed in a time interval t • Number of entries in a histogram bin when data is accumulated over a fixed time interval • Number of Prussian soldiers kicked to death by horses • Infant mortality • QC/failure rate predictions

Poisson Distribution • Let

Gaussian Distribution • Gaussian distribution • Important because of the central limit theorem • For n independent variables x1,x2,…,xN that are distributed according to any pdf, then the sum y=∑xi will have a pdf that approaches a Gaussian for large N • Examples are almost any measurement error (energy resolution, position resolution, …)

Gaussian Distribution • The familiar Gaussian pdf is

Gaussian Distribution • Some useful properties of the Gaussian distribution are • P(x in range m±s) = 0.683 • P(x in range m±2s) = 0.9555 • P(x in range m±3s) = 0.9973 • P(x outside range m±3s) = 0.0027 • P(x outside range m±5s) = 5.7x10-7 • P(x in range m±0.6745s) = 0.5

c2 Distribution • Chi-square distribution

c2 Distribution

Probability

Probability • Probability can be defined in terms of Kolmogorov axioms • The probability is a real-valued function defined on subsets A,B,… in sample space S • This means the probability is a measure in which the measure of the entire sample space is 1

Probability • We further define the conditional probability P(A|B) read P(A) given B • Bayes’ theorem

Probability • For disjoint Ai • Usually one treats the Ai as outcomes of a repeatable experiment

Probability • Usually one treats the Ai as outcomes of a repeatable experiment • Then P(A) is usually assigned a value equal to the limiting frequency of occurrence of A • Called frequentist statistics • But Ai could also be interpreted as hypotheses, each of which is true or false • Then P(A) represents the degree of belief that hypothesis A is true • Called Bayesian statistics

Bayes’ Theorem • Suppose in the general population • P(disease) = 0.001 • P(no disease) = 0.999 • Suppose there is a test to check for the disease • P(+, disease) = 0.98 • P(-, disease) = 0.02 • But also • P(+, no disease) = 0.03 • P(-, no disease) = 0.97 • You are tested for the disease and it comes back +. Should you be worried?

Bayes’ Theorem • Apply Bayes’ theorem • 3.2% of people testing positive have the disease • Your degree of belief about having the disease is 3.2%

Bayes’ Theorem • Is athlete A guilty of drug doping? • Assume a population of athletes in this sport • P(drug) = 0.005 • P(no drug) = 0.995 • Suppose there is a test to check for the drug • P(+, drug) = 0.99 • P(-, drug) = 0.01 • But also • P(+, no drug) = 0.004 • P(-, no drug) = 0.996 • The athlete is tested positive. Is he/she involved in drug doping?

Bayes’ Theorem • Apply Bayes’ theorem • ???

Binomial Distribution • Calculating efficiencies • Usually use e instead of p

Binomial Distribution • But there is a problem • If n=0, d(e’) = 0 • If n=N, d(e’) = 0 • Actually we went wrong in assuming the best estimate for e is n/N • We should really have used the most probable value of e given n and N • A proper treatment uses Bayes’ theorem but lucky for us (in HEP) the solution is implemented in ROOT • h_num->Sumw2() • h_den->Sumw2() • h_eff->Divide(h_num,h_den,1.0,1.0,”B”)

Exploring Probability and Statistics: Concepts, Models, and Applications

Exploring Probability and Statistics: Concepts, Models, and Applications

Presentation Transcript

Probability and Statistics

Probability and statistics

Probability and Statistics

Probability and Statistics

Statistics and Probability

Probability and Statistics

Probability and Statistics

Probability and Statistics

PROBABILITY AND STATISTICS

Statistics and Probability

Probability and Statistics

Probability and Statistics

Probability and Statistics

《 Probability and statistics 》

Probability and Statistics

Probability and Statistics

Probability and Statistics

PROBABILITY AND STATISTICS

PROBABILITY AND STATISTICS

PROBABILITY AND STATISTICS

PROBABILITY AND STATISTICS

STATISTICS AND PROBABILITY