Techniques for Correct Data Presentation in Statistics

Lecture 4 • Random variables continued. • Monte Carlo • Uncertainty propagation • Correct presentation of data. • How to obtain the ‘best-fit’ model: basic considerations. • Techniques for finding minima (or maxima).

Advertisement! • A simple FITS dump-to-ASCII script is available off my web page. • I’ll try to get FTOOLS installed on the NASSP machines.

A python note: • Sarah Blyth has a good intro manual for plotting with pylab: http://www.ast.uct.ac.za/~sarblyth/pythonGuide/PythonPlottingBeginnersGuide.pdf

Samples from the probability density function p(x) Estimate of μ: Estimate of σ2: x N samples xi – random numbers having the distribution p(x)

Estimating the PDF: a frequency histogram p(x) • Definition of a histogram: • Set up bins, usually (but • not always) of equal width; • Count the samples which • fall in each bin. Note that the bin heights have some scatter – in fact these numbers are Poisson random variables. x

Estimating the PDF: mean, variance etc • Estimating the three important properties of the PDF from N samples of X: • A frequency histogram serves as an estimate of p(x). • Estimate of the mean: • Estimate of the variance: • Note: the result of every (non-trivial) transformation of a set of random numbers is itself a random number. (For example, the estimators for the mean and variance are themselves random numbers.) Note: the ‘hats’ here mean ‘estimate’.

Estimating the correlation between two different random variables: • Suppose we have N paired samples of variables A and B. • The covariance between these samples is estimated by • If we have M different variables, we can define an M x Mcovariance matrix.

Uncertainty propagation • If some random variable z is a function f(y) of other random variables y=[y1,y2,...yM], then • σ2i,i ≡ σ2i is just the variance of the variable yi. • σ2i,j is the covariance between yi and yj. • The formula looks horrible, but in practice it is often simple to evaluate...

Uncertainty propagation • Often (not always!) the different yi are uncorrelated – ie, the value of one does not depend on another. In this case σi,j2=0 for i≠j and so • Examples (all uncorrelated):

The Monte Carlo • It is often useful to be able to construct simulated data. • Perhaps to test some code designed to process real data; • Or to estimate a PDF for which you don’t have a formula. • But of course, realistic data must contain random noise. • A procedure which constructs a set of N random variables is called a Monte Carlo (after the famous casino).

Simulating random variables: • There are routines in numpy and scipy to give you random numbers from a large library of different PDFs. • Bear in mind that these modules have essentially been written by amateurs – so be a bit wary – check them where practical! • There are simple algorithms for simulating gaussian and poisson randoms. • Joint randoms are a bit trickier. • The ‘rejection method’ • Markov-chain Monte Carlo

Making a frequency histogram • The input information is a list of MC-generated samples from a PDF. • Start by making a normal histogram of these samples: • Ie, set up bin boundaries, then count how many samples fall within each bin. • Calculate an uncertainty from each bin value from the square root of the counts in the bin. • Because you want to compare to a PDF, the ‘integral’ of your FH must = 1. • To get this, divide each bin value by • the bin width; and • the total number of samples (summed over all bins). • NOTE! Everything you do to the bin value, also do to the uncertainty. • Histogram values are integers but FH values turn to reals (floating points).

Graphs – correct presentation of data. • Distinguish between data and interpretation. • Any kind of fitted model counts as interpretation. • Usually draw data as points (or crosses, circles, etc – ie, as a discrete symbol). Joining the points up is EVIL!! • …unless this is the best way to clarify a crowded plot. • If you do join symbols with a line, make sure the symbols are not obscured by the line. • Include error bars where appropriate. • Most often on just the Y axis but occasionally also on X. • Sometimes you won’t know the errors, • or the scatter in the points will indicate it anyway. • Interpretation = theory can be drawn with curves.

Error bars Is this spike significant?

Error bars Probably

Error bars Probably not.

Correct presentation of data continued. • Try to find some way to transform the axes such that the points are expected to fall on a line – sometimes one with zero slope. • Why? Because there are uncounted ways not to be linear but only 1 way to be linear. • You’ll also need to transform the error bars. • A log scale is sometimes useful too if the data has information over a wide range of scales.

Axes transforms - examples • For a ‘power law’, ie y(x) = Axα: take logs (to base 10) of both x and y axes. • Gives a straight line of slope α and intercept log(A). • For an exponential decay y(x) = Ae-kx: take logs of the y axis only. • y(x) = Ax-2: plot y against 1/x2.

Example of MC, error propagation, correct presentation: 1000 samples

With more samples… 105 samples

An example of when error bars are superfluous (also an example of axes transformation)

Another… this is a survival function histogram. Errors are not independent! From M Tajer et al (2004).

The astronomer has: • Lots of (noisy) data points y = [y1, y2, .. yn]; • A model with a relatively small number of adjustable parameters Θ = [θ1, θ2, .. θm].

What do we want from this? • We want to find a model which is in some sense the ‘best fit’ to the data. This means obtaining: • The best fit values of the parameters Θ; • Uncertainties for these. • We may also want to compare competing models. • A very common example: comparing a model without any signal to a model (or rather, the whole class of models) with some signal. • The model without is known as the null hypothesis. Model-comparison will come in later lectures...

The ‘best fit’ model. • First construct your model. • Informed by the physics of the situation. • Two basic approaches: Bayesian and Frequentist. Will cover Bayesian later. Frequentist approach is to define some objective function Uwhich, when minimized (or maximized), returns a ‘best fit’ model. • U must obviously be a function of both the data and the model parameters. • Examples: least-squares, likelihood, ‘entropy’.

A backward arrangement..? • I’m going to cover techniques of minimization first, then discuss ‘best-fit’ objective functions later. • A good reference for all kinds of algorithms: Press et al, “Numerical Recipes <for fortran, C, C++ etc>”. • The code they provide is not always very good, but the explanations and recipes are excellent. • Older versions (still excellent) are available on the web - eg: http://www.nrbook.com/a/bookfpdf.php

Minimization • Nearly always in model-fitting we are trying to find the minimum in an analytic function – which is math-speak for a function which can be expanded in a Taylor series about the point of interest. • ▼U, the gradient, is a vector of 1st derivatives of U w.r.t each parameter. • H is called the Hessian and is a matrix of 2nd derivatives of U.

Minimization • The definition of a minimum in U is a place where the gradient equals 0 – ie where the partial derivatives ∂U/∂θi = 0 for all θi. • IF the function was a perfect paraboloid*, ie if there were no terms in the Taylor series of order > 2, then no matter where we are in the Θspace, we could go to the minimum in 1 mighty jump, because • But because this is nearly always NOT true, in practice, minimization is an affair of many steps. • It’s of course desirable to keep the number of steps as small as possible. *Or if we can directly invert the equations ∂U/∂θi = 0.

Techniques for Correct Data Presentation in Statistics

Techniques for Correct Data Presentation in Statistics

Presentation Transcript

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

LECTURE # 4

Lecture 4

Lecture 4

LECTURE 4

LECTURE 4

Lecture 4

Lecture 4

Lecture 4

Lecture 4

LECTURE № 4