330 likes | 450 Views
פרקים נבחרים בפיסיקת החלקיקים. אבנר סופר אביב 2007 6. Administrative stuff. Projects status Other homework problems: Open questions in HW #1 (questions about the Quantum Universe) and HW #3 (difference between D mixing and Bs mixing analyses) – we will go over them when return from break
E N D
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 6
Administrative stuff • Projects status • Other homework problems: • Open questions in HW #1 (questions about the Quantum Universe) and HW #3 (difference between D mixing and Bs mixing analyses) – we will go over them when return from break • The plan for the next few weeks: • Statistics (with as many real examples as possible) • Root and RooFit • Practicing statistics and analysis techniques • Lecture on Tuesday, April 10 (Mimona) instead of Monday (Passover break)?
Why do we use statistics in EPP? • Scientific claims need to be based on solid mathematics • How confident are we of the result? What is the probability that we are wrong? • Especially important when working at the frontier of knowledge: extraordinary claims require extraordinary proof • Proving something with high certainty is usually expensive • Many first measurements are made with marginal certainty • Statistical standards: • “Evidence” • “Observation”
Probability • Set S (sample space) • Subset A S • The probability P(A) is a real number that satisfies the axioms • P(A) 0 • If A and B are disjoint subsets, i.e., A B = 0, then P(A B) = P(A) + P(B) • P(S) = 1
Derived properties • P(!A) = 1 – P(A), where !A = S – A • P(A !A) = 1 • 1 P(A) 0 • P(null set) = 0 • If A B, then P(B) P(A) • P(A B) = P(A) + P(B) – P(A B)
More definitions • Subsets A and B are independent if P(A B) = P(A) P(B) • A random variable x is a variable that has a specific value for each element of the set • An element may have more than 1 random variables: x = {x1, …xn}
Interpretation of Probabilityin data analysis • Limiting relative frequency: • Elements of the sample space S = possible outcomes of a repeatable measurement • The probability of a particular outcome e (= element of S) is(note that the single element e belongs to a subset with one element = an elementary subset) • A non-elementary subset A corresponds to an occurrence of any of the outcomes in the subset, with probability
Example 1 • Element e = D mixing parameter y’ measured to be 0.01 • Subset A = y’ measured to be in range [0.005, 0.015] • P(A) = fraction of experiments in which y’ is measured in [0.005, 0.015], given that its true value is 0.002
Example 2 • e = (x’2, y’) measured to be (-0.0002, 0.01) • A = (x’2, y’) measured to be anywhere outside the brown (“4s”) contour • P(A) = fraction of experiments in which (x’2, y’) are measured outside the contour,given that their true values are the measured ones
Example 3 • e = error on CP-violating parameter q- measured to be 42 • A = q- error measured to be 42 or greater • P(A) = fraction of experiments in which the q- error is measured to be 42 or greater
About the relative frequency interpretation • Straightforward when measurements are repeatable: • Particle collisions in an experiment • Radioactive decays of identical nuclei • Also works when measurements are repeatable only in principle : • Measurement of the D mixing parameters using all the data we will ever have • Measurement of the average height of all humans Physical laws don’t change
Probability density functions • Outcome of an experiment is a continuous random variable x • Applies to most measurements in particle physics • Define: the probability density function (PDF) f(x), such that f(x) dx = probability to observe x in [x, x+dx] = fraction of experiments in which x will be measured in [x, x+dx] • To satisfy axiom 3: P(S) = 1, normalize the PDF:
The PDF and finite number of observations • A set of nmeas measurements xm (m=1…nmeas) can be presented as a histogram:nb (b=1…nbins) = number of measurements for which x falls in bin b • nb / nmeas = probability for a measurement to be in bin b. • b nb / nmeas = 1 • nb / (nmeasDxb) = (discrete) probability density function • Continuum limit (infinite number of observations, infinitely fine binning):
Cumulative distribution • The cumulative distribution of f(x) is • Alternatively: F(x) = probability to obtain measurement whose value is < xf(x) = dF(x)/dx (for differentiable F(x)) • a-point xa is the value of x such that F(xa) = a, where 1 a 0.Or: xa = F-1(a) • Median = x½ = value of x such that F(x½) = ½ • Mode = xmode such that f(xmode) > f(all other values of x) • may not be useful or unique if f(x) has multiple local maxima
Extension to multi-variable PDFs • For f(x), x = {x1, … xn}, the a-point turns into an a-contour of dimension n-1 • Marginal PDFs: • fx(x) = f(x,y) dy • fy(y) = f(x,y) dx • x and y are independent variables if f(x,y) = fx(x) fy(y) • Also called uncorrelated variables
da a dx x Functions of random variables • a(x) is a continuous function of random variable x, which has PDF f(x) • E.g., a = x2, a = log(x), etc. • What is the PDF g(a)? • Require equal probabilities in corresponding infinitesimal regions:g(a) da = f(x) dx g(a) = f(x(a)) |dx/da| Abs value to keep PDF positive Assumes a(x) can be inverted
Example • The CP-violation phases a,b,g are not measured directly. We measure cosf or sinf or sin2f, then transform to the phases:
da a dx1 dx2 x Multiple-valued x(a) • If a(x) is not uniquely invertable, need to add up the different contributions. dS(a) = sum of 2 regions
Functions of multiple random variables • What is g(a) for a(x), x = {x1, … xn} • Example.: z = xy, what is f(z) given g(x) and h(y)? • f(z) is the Mellin convolution of g(x) h(y) dS is the hypersurface in x that encloses [a, a+da]
Another example: z = x + y • f(z) is the familiar Fourier convolution of g(x) and h(y). • Recall from the D mixing analysis: The measured decay time t is the true decay time t’ (distribution P(t’)) + a random detector error Dt (distribution r(Dt):
Multiple functions of multiple random variables • g(a1, …. an) = f(x1, … xn) |J|, where the Jacobian is • To determine the marginal distribution gi(ai), need to integrate g(a1, …. an) over the aji variables
Expectation values • The expectation value of a random variable x distributed according to the PDF f(x): • Also called population mean • E[x] is the most commonly used location parameter (others are the a-point xa and the mode) • The expectation value of a function a(x) is
Moments • The nth algebraic moment of f(x): • Note that the population mean m is the special case m’1 • The nth central moment • In particular,is the population variance of f(x) • The standard deviation gives an idea of the spread of f(x)
Mean and variance of functions • Take a function of many random variables: a(x). Then
Covariance • For 2 random variables x, y, the covariance cov[x,y] or Vxy is • For 2 functions a(x), b(x), the covariance is • Note that Vab = Vba and Vaa = sa2 • The dimensionless correlation coefficient is rab = Vab / (sasb) • Note that 1 rab -1
Understanding covariance and correlation • Vxy =E[(x - mx)(y - my)] is the expectation value of the product of the deviations from the means. • If having x > mx increases the probability of having y > my then Vxy > 0, x and y are positively correlated • If having x > mx increases the probability of having y < my then Vxy < 0, x and y are negatively correlated or anti-correlated. • For independent variables (defined as f(x,y) = fx(x) fy(y)), we find E[xy] = E[x] E[y] = mxmy so Vxy = 0. • Does Vxy = 0 necessarily mean that the variables are independent?...
Covariance and correlation • …No. E.g.,
Propagation of errors • Take n random variables x with unknown PDF f(x), but with E[x] and Vij known (or estimated) • Take the function y(x). What are E[y] and V[y]? • Remember: we don’t know f(x). • Expand y:
Because we often estimate errors from covariances Why is this “error propagation”?
Special cases Note: These formulae don’t work if y is significantly non-linear within a distance si around the mean m
Orthogonal transformation of variables • It is often useful to work in variables in which the covariance matrix is diagonal: cov[yi,yj] = si2dij • This can always achieved with a linear transformation: where the rows of the transformation matrix Aij are the eigenvectors ofcov[xi,xj]. • Then si2 are the eigenvalues of cov[xi,xj]
Visualize for 2 dimensions • Recall the definition of the correlation coefficient rab = Vab / (sasb). So we can write Eigenvector 1 Eigenvector 2
More on linear variable transformations • The uncorrelated variables yi have a simpler covariance matrix, but may not always correspond to physically interesting quantities • E.g., in D mixing, x’2 and y’ have a very high correlation coefficient of r = -0.94 • But they are the physically interesting variables…