160 likes | 482 Views
What are they?. Correlation- tells how much two variables are relatedX and Y measured independentlyLine fitting derives a best-fitting model between two variables.Least squares (linear regression - straight line)Curved lines (polynomial or spline fit)Typically, for known X and measured Y (fu
E N D
1. Data fitting Correlation & line fitting
2. What are they? Correlation- tells how much two variables are related
X and Y measured independently
Line fitting – derives a best-fitting model between two variables.
Least squares (linear regression - straight line)
Curved lines (polynomial or spline fit)
Typically, for known X and measured Y (function of time, etc)
3. correlation
4. Correlation coefficient
5. correlation
6. Confidence interval for correlation Possible to define a variable w
7. Use this mean and variance to set the normal distribution Now can check confidence intervals
Often useful to check confidence interval of the null hypotheses (rxy=0)
8. Least squares line fitting(linear regression) For perfect linear correlation, it is straightforward to define an equation so that
Need to determine the coefficients A and constant B so that they define a straight line that fits the data as “well” as possible
We are “estimating” the best value of A and B.
We are assuming that the “x” value is known exactly and that the y value is uncertain.
9. Least squares fit Common to use a least-squares fit.
The error between the best-fitting line and each data point is (y-y’) where y is the data and y’ is the best fit (in a vertical distance).
We seek to minimize the sum of all the errors squared.
Why squared? Well, it has some nice properties.
10. Some details
11. More details We can think of the best fit line as a sort of mean value.
The scatter is measured by the estimated standard error.
This is analogous to the standard deviation.
12. Confidence intervals 95 % confidence interval for y (i.e., we are 95% sure that y lies between the values a and b is defined by:
(a,b) = (y’-k,y’+k) where k is
13. Some problems Outliers tend to skew the line away from other data.
Results in a poor fit.
Line is weighted by the square of the vertical distance between the data point and the trend.
One large offset counts more than several small ones.
14. Why square? Could use 3rd power
Or just absolute value
Also provide a straight line
More complicated and less elegant mathematics.
May be useful for some data
Absolute value handles outliers better.
15. Least-squares fit and Excel Three ways (at least) to make a least squares fit to data in Excel.
Use linest(y,x,b,stats) and then plot.
Allows calculation of statistics
Powerful but complicated.
Use regression in Analysis ToolPak add-in
Make data plot (without line), then left click on data point. Then add trend line – much easier but it is not clear how it does it.
16. Excel output for regression
17. Fitting a curved line Suppose the data are exponential or something you expect is curved.
Use a polynomial fit - click box under add trendline
Spline fit
Nonlinear least squares