1 / 30

EART20170 Computing, Data Analysis & Communication skills

EART20170 Computing, Data Analysis & Communication skills. Lecturer: Dr Paul Connolly (F18 – Sackville Building) p.connolly@manchester.ac.uk. 1. Data analysis (statistics) 3 lectures & practicals statistics open-book test (2 hours) 2. Computing (Excel statistics/modelling) 2 lectures

suter
Download Presentation

EART20170 Computing, Data Analysis & Communication skills

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EART20170 Computing, Data Analysis & Communication skills Lecturer: Dr Paul Connolly (F18 – Sackville Building) p.connolly@manchester.ac.uk 1. Data analysis (statistics) 3 lectures & practicals statistics open-book test (2 hours) 2. Computing (Excel statistics/modelling) 2 lectures assessed practical work Course notes etc: http://cloudbase.phy.umist.ac.uk/people/connolly Recommended reading: Cheeney. (1983) Statistical methods in Geology. George, Allen & Unwin

  2. Recap – last lecture • The four measurement scales: nominal, ordinal, interval and ratio. • There are two types of errors: random errors (precision) and systematic errors (accuracy). • Basic graphs: histograms, frequency polygons, bar charts, pie charts. • Gaussian statistics describe random errors. • The central limit theorem • Central values, dispersion, symmetry • Weighted mean.

  3. Some common problems

  4. Use tables

  5. Lecture 2 • Correlation between two variables • Classical linear regression • Reduced major axis regression • Propagation of errors in compound quantities.

  6. Correlation • Many real-life quantities have a dependence on some thing else. E.g dependence of rock permeability on porosity. • How can we quantify the strength and direction of a linear relationship between X and Y variables?

  7. Correlation •  y = sum of all y-values •  x = sum of all x-values •  x2 = sum of all x2 values •  y2 = sum of all y2 values •  xy = sum of the x times y values • Like other numerical measures, the population correlation coefficient is (the Greek letter ``rho'‘, ) and the sample correlation coefficient is denoted by r. • Linear correlation (Pearson’s coefficient)

  8. Correlation • Values of r r = +1 r = -1 r = 0 y y y x x x Perfect positive correlation Perfect negative correlation No correlation

  9. 1.0 0.9 0.8 0.7 0.6 r2, fraction of explained variation 0.5 0.4 0.3 0.2 0.1 0.0 +1.0 +0.5 +0.0 -0.5 -1.0 Correlation coefficient, r Correlation • r2 is the amount of variation in x and y that is explained by the linear relationship. It is often called the `goodness of fit’ • E.g. if an r = 0.97 is obtained then r2 = 0.95 so 100x0.95=95% of the total variation in x and y is explained by the linear relationship, but the remaining 5% variation is due to “other” causes.

  10. Regression analysis • How can we fit an equation to a set of numerical data x, y such that it yields the best fit for all the data?

  11. Classical linear regression • An approximate fit yields a straight line that passes through the set of points in the best possible manner without being required to pass exactly through any of the points.

  12. y Linear Regression m { ei c x Classical linear regression Y=mx+c • Where ei is the deviation of the data point from the fit line, c is the intercept, m is the gradient. • Assumes that the error is present only in y.

  13. How do we define a good fit? • If the sum of all deviations is a minimum? ei • If the sum of all the absolute deviations is a minimum? |ei| • If the maximum deviation is a minimum? emax • If the sum of all the squares of the deviations is a minimum? ei2

  14. Classical linear regression • The best way is to minimise the sum of the squares of the deviation. Formally this involves some Mathematics: • At each value of xi: • Therefore the deviations from the curve are: • The sum of the squares:

  15. Classical linear regression • How do you find the minimum of a function? • Use calculus • Differentiate and set to zero • Two simultaneous equations

  16. Classical linear regression • Solving the two equations yields:

  17. Classical linear regression

  18. Classical linear regression • Classical linear regression only considered errors in the Y values of the data. • How can we consider errors in both x and y values? • Use Reduced major axis regression

  19. dx { y { dy c x Reduced major axis regression • Method to quantify a linear relationship where both variables are dependent and have errors • Instead of minimising e2=(Y-y)2 we minimise e2=dy2+dx2.

  20. Reduced major axis regression

  21. Reduced major axis regression

  22. Error propagation • Every measurement of a variable has an error. • Often the error quoted is one standard deviation of the mean (mean ± standard deviation) • The standard deviation of the sample mean is usually our best estimate of the population standard deviation

  23. Error propagation • Error propagation is a way of combining two or more random errors together to get a third. The equations assume that the errors are Gaussian in nature. • It can be used when you need to measure more than one quantity to get at your final result. For example, if you wanted to predict permeability from a measured porosity and grainsize. The equations introduced here let you propagate the uncertainties on your data through the calculation and come up with an uncertainty on your results. • How then do we combine variables which have errors?

  24. Error propagation - quoted Relationship Error propagation (k=constant)

  25. Example of propagation of error • Suppose we measure the thickness of a rock bed using a tape measure. • The tape measure is shorter then the bed thickness so we have to do it in two steps x and y. • We repeat the measurements 100 times and obtain the following mean and standard deviation values for x and y: • The thickness of the bed should be simply: • But what about the error on the total thickness? x=12.1±0.3 cm y=4.2±0.2 cm x+y=16.3 cm

  26. Example of propagation of error • It is given by propagating the individual errors as follows: • So the final answer for the total thickness of the bed is: • Error propagation formulae are non-intuitive and understanding how they are derived requires some mathematical knowledge 16.3±0.4 cm

  27. More complex examples • What if we have several functions of several variables? • E.g. calculating density using Archimedes Principle: • This equation contains two functions and two variables • Error propagation is best done in parts, so first work out value and error in denominator: • Then the value and error of: • In a few of weeks we will use a Monte Carlo method for solving more complex functions

  28. Reminder Statistics practical #2 • Those not taking BIOL20451: Roscoe 3.5 1100 – 1300 Tuesday • Those taking BIOL20451: Williamson 1.12 1400 – 1600 Tuesday

  29. Some common problems • Weighted mean f x

  30. What does adding two variables really mean?

More Related