1.05k likes | 1.22k Views
MChem Computing and Chemistry [B14SC3]. “ Data Analysis for Beginners… ” or How to avoid disasters when writing up your research work… or On the Meaning of Life, the Universe and Everything!. Lecture #42 Dr Roderick Ferguson – Summer 2008. The “data analysis” assignment….
E N D
MChem Computing and Chemistry [B14SC3] “Data Analysis for Beginners…” or How to avoid disasters when writing up your research work… or On the Meaning of Life, the Universe and Everything! Lecture #42 Dr Roderick Ferguson – Summer 2008
The “data analysis” assignment… This is in 3 parts Some useful information, and a copy of the MS Word document can be found at:- http://www.eps.hw.ac.uk/~cherrf/B14SC3 (this link is also available from my chemistry staff home page)
The “data analysis” assignment… Now, if you are really stuck and can’t see how to get started with the 2nd part of this… then I will be available over the next 2 days i.e. Wed 4th and Thurs 5th June to offer some help - My office is now DB 2.49 But, you should have enough knowledge to be able to attempt this yourselves!
Data Analysis for Beginners… • Introduction to Data Analysis. • Linear Least Squares. • Nonlinear Least Squares. • Theoretical Models (and Maths). • Errors (and what to do with them!).
Data Analysis for Beginners… • Introduction to Data Analysis. • Linear Least Squares. • Nonlinear Least Squares. • Theoretical Models (and Maths). • Errors (and what to do with them!).
Introduction to Data Analysis Why do you need to do it? • Data Analysis is an essential skill for a professional scientist today. • Many modern instruments can generate large quantities of numerical data which require some sort of analysis and/or theoretical interpretation.
Introduction to Data Analysis How can you do it? • Most people now have access to very powerful Desktop Computers… • There are many software tools that can be used to analyse numerical data… • Today we will focus on what can be done with something that is readily available – ie Microsoft Excel… !
Introduction to Data Analysis An historical aside • This was not always true… • Modern research workers have no idea of what performing data analysis was like before the (micro) computer revolution that started in the 1980’s…
Introduction to Data Analysis An historical aside (cont) • To do data analysis, you had to have access to a large “mainframe” computer… • You also had to learn at least one computer programming language… • And you also had to type in both your numerical data and analysis program onto punched paper cards!
Introduction to Data Analysis • There are also many pitfalls and traps that the new research worker can very easily fall into! • Thus some background knowledge on both the how and the why aspects is required… • Also, it is never a good idea to use something without first understanding how it works!
Introduction to Data Analysis At first, your reactions will probably be fear and confusion when you try to do Data Analysis…
Introduction to Data Analysis • However, Don’t Panic! • because, it’s easier than you think…
Data Analysis for Beginners… • Introduction to Data Analysis. • Linear Least Squares. • Nonlinear Least Squares. • Theoretical Models (and Maths). • Errors (and what to do with them!).
Linear Least Squares Whilst performing data analysis, you will encounter the following terms… • “Best Fit” • “Goodness of Fit” • “Residuals” • “Sum of Squares” What do they mean?
Linear Least Squares – an example We have two columns of data and want to see if there is a LINEAR relationship between them … Step 0 – Draw a Graph! point # X Y 1 1.1 1.4 2 2.0 1.8 3 2.9 2.2 4 4.2 2.8 5 5.0 2.9
Linear Least Squares – an example What would your idea of a good straight line fit to this data be?
Linear Least Squares – an example Can we give a more precise mathematical description for the idea of “Best Fit” ? Yes – we can! Need some definitions. We’ll look at Residuals and the Sum of Squares (SS) or more precisely, the “Sum of Squares of the Residuals”.
Linear Least Squares – an example Back to our graph again … … but now with the residuals added!
Linear Least Squares – theory (1) Consider the i’th data point with values (Xi,Yi) and suppose that the data can be described by the familiar straight line relationship F = m X + C where m is the slope and C is the intercept. • Now, for each experimental Yi we calculate a theoretical Yi (which we’ll call Fi ) by using the above equation ie Fi = m Xi + c, for all of the i data points.
Linear Least Squares – theory (2) • The difference between each calculated and experimental value of Y is called the Residual ie we have Ri = Yi – Fi for all i data points. • Note that sometimes Ri will be positive and also sometimes it will be negative… • How can we get an overall measure of how close the theoretical line is to our data? • Clearly, it must have something to do with ALL of the residuals …
Linear Least Squares – theory (3) We define the Sum of Squares of the Residuals (or just SS) as :- • This gives us a single quantity that measures how good a fit the straight line is to the data. • Note also that SS will depend on both m and C
Linear Least Squares – theory (3) ie SS = SS(m,C), which means that SS is a function of two independent variables {m and C} so that: From our original problem we have now got a new problem i.e. create a Sum of Squares function, and we need to find values of m and C which minimise this function!
Linear Least Squares – theory (4) In other words, • How do we find minimum values (or minima) of functions ? • We need the help of Calculus – ie the part of Maths that deals with the rate of change of a function … Best Fit => find Minimum of the SS function!
F(x) Tangent line ( slope -ve) Tangent line (slope = 0) => a minimum! Tangent line (slope +ve) Linear Least Squares – Simple Calculus (1) Recall a function of one variable:
Linear Least Squares – Simple Calculus (2) Function of one variable: • The slope of the tangent line is given by the rate of change of F with x, dF/dx or the derivative of F. • Furthermore, at a minimum (or maximum) value of the function F(x), the slope of the tangent line is zero ie dF/dx = 0 We can also define functions of more than one variable…
Linear Least Squares – Simple Calculus (3) Function of more than one variable: • If F = F(x,y,z), a function of three variables x, y and z – then we can define 3 partial derivatives namely, ∂F/ ∂x, ∂F/∂y and ∂F/∂z. You may be familiar with this notation from your Thermodynamics studies… Note that partial derivatives are very useful in ALL branches of the Physical Sciences!
Linear Least Squares – theory (5) Recall our original problem of finding the minimum of the function, SS(m,C): • need to find the values of m and C that make the two partial derivatives of SS vanish • ie we need to solve the pair of equations:-
Linear Least Squares – theory (6) • this is very easy to do for the straight line case • Our SS function, SS(m,C), is given by the following (after some expansion!) ie SS is of the general form:
SS C m Linear Least Squares – theory (7) For the straight line, the Sum of Squares function is a conic section ie a contour map of this surface will be a series of concentric ellipses.
Linear Least Squares – theory (8) When we perform the two differentiations, we get: or
Linear Least Squares – theory (9) These two linear equations are very easy to solve for both m and C… (you can try doing this as an exercise… !) => any good pocket calculator can do linear least squares fits to data! Now, we’ll look at a few applications of Linear Least Squares theory…
Linear Least Squares – applications (1) Linear least squares analysis can be extended to help with other problems. Often, you will encounter polynomials eg and we can set up an SS function for this n’th degree polynomial… ie SS = S(yi-fi)2 = SS(a0, a1, a2, a3,…, an)
Linear Least Squares – applications (2) For a polynomial of degree ‘n’, we have to solve the following system of ‘n+1’ linear equations:- These equations can be solved by standard matrix methods using ‘Linear Algebra’ The Microsoft EXCEL spreadsheet computer program can do linear least square fits of this more general kind via the LINEST function.
Linear Least Squares – applications (3) Two uses of polynomials… 1) Calibration curves In Chemistry, polynomial functions are often used to construct calibration curves for some analytical technique such as Mass Spectrometry or Atomic Absorption Here the instrument response is known to be describable by a polynomial function (usually a 3rd or 4th degree polynomial).
Linear Least Squares – applications (4) Two uses of polynomial functions… 2) Data Smoothing Another application of polynomials and linear least squares fitting is data smoothing and interpolation. Example – X Ray scattering data from amorphous polymer samples (Dr Arrighi). These experiments can generate HUGE data files!
Linear Least Squares – applications (5) The Intensity vs angle and temperature data are conveniently stored as 2 dimensional Excel spreadsheets. Also, the I(Q,T) vs Q plots for a fixed temperature, T, are often found to be very noisy. Quadratic polynomials can be used to ‘smooth’ the data so that important features stand out. How does this work?
Linear Least Squares – applications (6) First, here’s the original noisy data:- I(Q,T) Q
Linear Least Squares – applications (7) And now, here’s the smoothed data:- I(Q,T) Q
Original data point Smoothing Polynomial Interpolated data point Linear Least Squares – applications (8) And here’s how smoothing works…
Linear Least Squares – applications (9) Data Smoothing is a potentially risky operation… You must be very careful when you do this… Why? Because you could be throwing away some vital information – especially if you use too much data smoothing!
Data Analysis for Beginners… • Introduction to Data Analysis. • Linear Least Squares. • Nonlinear Least Squares. • Theoretical Models (and Maths). • Errors (and what to do with them!).
Nonlinear Least Squares • Often we need to fit data to a more complicated nonlinear function… • The sum of squares equations are then also nonlinear… • and so we have to use other methods to solve the minimisation problem…
Nonlinear Least Squares • Another feature of this type of problem is that you have to supply a reasonable starting guess for the parameters used in your theoretical model… • You must have a feeling for the behaviour of your model function… • Good idea to plot out both your data and model function on the SAME graph!
Nonlinear Least Squares • By trying out several different sets of parameter values, you can get a rough idea of where a good starting guess is. • Once a suitable starting set of parameters has been found, then there are several methods (algorithms) that can be used to locate the minimum in the SS function.
Nonlinear Least Squares Example • For a good example of a Chemistry based nonlinear curve fitting exercise, one can look at First Order Chemical Kinetics. • The concentration of a new molecule that is being produced in a chemical reaction that follows First Order Kinetics can be described as:-
Nonlinear Least Squares Example here c(t) is the concentration at any time t, and c∞ is the final steady state concentration ie c∞= c(∞) To see where this comes from, let’s look at the rate of change of c with time ie dc/dt
Nonlinear Least Squares Example or equivalently This is an example of a first order linear differential equation.
Nonlinear Least Squares Example Here is a plot of the c(t) function:-
Nonlinear Least Squares Example • For some models, one can transform a nonlinear function into a linear function by using some maths… • eg if the model was c(t) = c0 exp(-kt) then taking logs would give a linear equation • However, this trick does not work for our particular first order kinetics problem!
Nonlinear Least Squares Example • We need to set this up as a nonlinear least squares curve fitting problem. • get an rough idea of possible starting values for the parameters from the c(t) graph. • Rate Constant, k, obtained from initial slope at t = 0 • Steady State Concentration, c∞ from long time data.