150 likes | 662 Views
Regression : Mathematical method for determining the best equation that reproduces a data set Linear Regression : Regression method applied with a linear model (straight line) Uses Prediction of new X,Y values Understanding data behavior Verification of hypotheses/physical laws.
E N D
Regression: Mathematical method for determining the best equation that reproduces a data set Linear Regression: Regression method applied with a linear model (straight line) Uses Prediction of new X,Y values Understanding data behavior Verification of hypotheses/physical laws Regression
DY DX Regression • The Linear Model Y = mX + b Y = Dependent variable X = Independent variable m = slope = DY/DX b = y-intercept (point where line crosses y-axis at x=0) X1=1, Y1=2.4 X2=20, Y2=10
Regression • Fitting the data:finding the equation for the straight line that does the best job of reproducing the data.
Regression • Residual:Difference between measured and calculated Y-values
Regression Analysis • Use the least square method to “best fit” a straight line through the data points. • A straight line is described by its slope and “y”-intercept in a x-y plot. • Need to determine the numerical values of the slope and the “y”-intercept from the data. • This is equivalent to adding a trendline to your scatter plot in EXCEL.
Regression Analysis • The least square method consists of defining a difference, called the residual, between the regression line and a data point along a measured “x” value. • Then add up the squared residuals for all data points. • Adjusting the slope and the “y”-intercept of the regression line so that the sum of squared residuals, called regression error, has the smallest value.
Regression Analysis • The covariance appears in the calculation of the correlation coefficient between the measurements of two variables. • Let us denote the two variables as “x” and “y”. • Their measurements are the “x” data set and the “y” data set.
Regression Analysis • The slope of the regression line is given by the ratio of the covariance between the “x” and “y” data sets and of the variance of the “x” data set. • You then use the equation of the line to determine the y-intercept. You MUST use the mean of x and the mean of y for this equation since your data points are likely not on the regression line.
Regression Analysis • Once we determined the slope and the “y” intercept of the regression line, we have a mathematical relation that ties the “x” variable to the “y” variable. • We can use this relation to predict values of “y” given a “x” value that are not on the data sets.
Regression Analysis • Interpolation – the process by which we use the regression line to predict a value of the “y” variable for a value of the “x” variable that is not one of the data points but is within the range of the data set. • The “x” and “y” points will lie on the regression line.
Regression Analysis • Extrapolation – the process by which we use the regression line to predict a value of the “y” variable for a value of the “x” variable that is outside of the range of the data set. • The “x” and “y” points also lie on the regression line but outside of the range of the data set.
Tricks of the Trade • A curve can be partitioned into sections and “best” fitted a different curve in each section. • Use scaling as a mean to increase the accuracy of the “fitted” curve.
Prediction:Once the best fit line has been determined, the equation can be used to predict new values of Y for any given X and vice versa. (Interpolation/Extrapolation) y = 772.03x + 10810 If a states % of the population with a college degree is 20%, then they can expect an average income level of y = 772.03(20) + 10810 = $26,250 If a states average income level is $30,000, then what % of its population has a college degree? x = (30,000 – 10810)/772.03 = 24.9% Multivariate AnalysisRegression
Excel Functions and Tools SLOPE() - Returns the slope when passed X, Y data.. INTERCEPT() - Returns the intercept when passed X, Y data.. LINEST() - Returns the slope and intercepts when passed X, Y data.. TREND() - Returns predicted values in a linear trend when passed X, Y data.. Trendline (from the Chart menu) Returns the trendline, equation, and correlation coefficient for a set of X,Y data. Multivariate Analysis