1 / 43

Introduction to Regression

Introduction to Regression. Introduction to linear regression. Simple linear regression Multiple regression Find out relationships between dependent and independent variable Introduction to SPSS. Simple linear regression. y = Dependent variable X = Independent variable.

jstockdale
Download Presentation

Introduction to Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Regression

  2. Introduction to linear regression • Simple linear regression • Multiple regression • Find out relationships between dependent and independent variable • Introduction to SPSS

  3. Simple linear regression y = Dependent variable X = Independent variable = y-intercept of the line (constant), cuts through the y-axis = Unknown parameter – Slope of the line = Random error component

  4. Terminology for simple regression

  5. Examples • With a sample from the population • denote a random sample of size n from the population File: dataspss-s4.1

  6. DETERMINING THE EQUATION OF THE REGRESSION LINE • Deterministic Regression Model – mathematical models that produce an ‘exact’ output for a given input • Probabilistic Regression Model- a model that includes an error term that allows for various values of output to occur for a given value of input = predicted value of y = value of independent variable for the ith value = realvalue of dependent variable for the ith value = population slope = population intercept = error of prediction for the ithvalue

  7. Simple Linear Regression Model Y Observed value of Y for Xi εi Slope = β1 Predicted value of Y for Xi Random error for this Xi value Intercept = β0 X Xi

  8. SampleRegression Function (SRF) = estimated value of Y for observation i xi = value of X for observation i b0 = Y- intercept is the value of Y when X is zero b1 = slope of the regression line change in Y for 1 unit X b1 > 0 : Line will go up; positive relationship between X and Y b1 < 0 : Line will go down; negative relationship between X and Y

  9. SIMPLE LINEAR REGRESSION MODEL (sample) Simple Linear Regression: where

  10. Sample Regression Function (SRF) (continued) • b0and b1are obtained by finding the values of b0 and b1 that minimizes the sum of the squared residuals (minimize the error). This process is called Least Squares Analysis • b0 provides an estimate of 0 • b1provides and estimate of 1 Yi = actual value of Y for observation i = predicted value of Y for observation i ei = residual (error)

  11. RESIDUAL ANALYSIS • Residual :Difference between the actual y value and the predicted y values predicted by the regression model • Sum of the residuals is zero

  12. Simple example X = experience years (year) Y = Income (10 million VND)

  13. Fit to the data y . 4 3 . . 2 . . 1 x 4 5 2 3 1

  14. Result of estimation by SPSS

  15. Meaning of b0and b1 • Y-Intercept (b0) • Average value of individual income (Y) is -0.1 (10 million VND) when when the experience year (X) is 0 • Slope (b1) • Income (Y) is expected to increase by 0.7 (*10 million VND) for each unit increased in experience year b1 > 0 : Line will go up; positive relationship between X and Y (increase) b1 < 0 : Line will go down; negative relationship between X and Y (decrease)

  16. Measures of Variation Total variation is made up of two parts. Error Sum of Squares Regression Sum of Squares Total Sum of Squares Variation attributable to factors other than the relationship between X and Y. Measures the variation of the Yi values around their mean Y. Explained variation attributable to the relationship between X and Y. /* Other notation for SSyy is SST. They are the same!

  17. Measure of Variation: The Sum of Squares (continued) Y   SSE =(Yi-Yi )2 _ SSyy=(Yi-Y)2 _  SSR = (Yi -Y)2 _ Y X Xi

  18. Result of estimation by SPSS

  19. Coefficient of Determination, r2 The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable. The coefficient of determination is also called r-squared and is denoted as r2 Note:

  20. The Coefficient of Determinationr2 and the Coefficient of Correlationr r2 = Coefficient of Determination Measures the % of variation in Y that is explained by the independent variable X in the regression model r = Coefficient of Correlation Measures how strong the relationship is between X and Y r > 0 if b1>0 r < 0 if b1 <0 0<r2<1 -1<r<1

  21. Examples of Approximate r2 values Perfect linear relationship between X and Y. 100% of the variation in Y is explained by variation in X. Y Y X X r2 = 1 r2 = 1 Y No linear relationship between X and Y. The value of Y does not depend on X (None of the variation in Y is explained by variation in X). X r2 = 0

  22. Examples of Approximate r2 values 0 < r2 < 1 Y Weaker linear relationships between X and Y. Some but not all of the variation in Y is explained by variation in X. X Y X

  23. Standard Error of the Estimate The standard deviation of the variation of observations around the regression line is estimated by: Where SSE = error sum of squares n = sample size

  24. Result of estimation by SPSS

  25. Inferences About the Slope The standard error of the regression slope coefficient (b1) is estimated by: Where = Estimate of the standard error of the least squares slope. = Standard error of the estimate.

  26. Result of estimation by SPSS Sb1 = 0.1914854

  27. Inference about the Slope: t Test t test for a population slope: • Is there a linear relationship between X and Y? Null hypothesis (H0) and Alternative hypothesis (H1) H0: β1 = 0 (no linear relationship) H1: β1 ≠ 0 (linear relationship does exist) Test statistic with d.f. = n-2 Where b1 = regression slope coefficient β1 = hypothesized slope Sb1 = standard error of the slope

  28. Result of estimation by SPSS

  29. Inferences about the Slope: tTest Example H0: β1 = 0 H1: β1 ≠0 Test Statistic: t = 3.66 T critical = +/- 3.182 (from t tables) Decision: Reject H0 Conclusion: There is sufficient evidence that number of customers affects weekly sales. d.f. = 5-2 = 3 /2=.025 Do not reject H0 /2=.025 Reject H0 Reject H0 -t/2 t/2 0 -3.182 3.182 3.66

  30. F Test for Significance F Test statistic: Where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom. k = the number of independent (explanatory) variables in the regression model.

  31. Result of estimation by SPSS

  32. F Test for Significance Example df1= k =1 df2 = n-k-1=5-1-1 Test Statistic: Conclusion: Reject H0 at  = 0.05 There is sufficient evidence that number of customers affects weekly sales. H0: β1 = 0 H1: β1 ≠ 0  = .05 df1= 1 df2 = 3 Critical Value: F = 10.128  = .05 0 F Do not reject H0 Reject H0 F.05 = 10.128

  33. Introduction to SPSS (file: dataspss-s4.1)

  34. Result of estimation by SPSS

  35. Voice of result • R-squared ranges in value between 0 and 1 • R2 = 0, nothing to help explain the variance in y • R2 = 1, all the same points lie on the estimated regression line • Example: R2 = 0.93 implies that the regression equation explains 93% of the variation in the dependent variable • Sig. (significant): Goodness of fit only if • Sig. of coefficient < 0.01  significant at 1%, Ho is rejected • 0.01 ≤ Sig. value < 0.05  significant at 5%, Ho is rejected • Sig. value > 0.1  significant at 10%, Ho is rejected

  36. Introduction to multiple regression • Multiple regression • Find out relationships between dependent and independent variables • Dummy variable enclosed • Solution? and SPSS

  37. Linear regression y = Dependent (or response variable) X1, X2, …., Xn = Independent or predictor variables = y-intercept of the line (constant), cuts through the y-axis = Unknown parameter – Slope of the line = Random error component

  38. Terminology for simple regression

  39. Example • File: dataspss-s4.2 • Dependent variable ? • Independent variables • SPSS program • Estimate and discuss

  40. Think? • Survey conducted with variables • Income • Age • Years in experience working • Education • Gender • ……… Think which one is dependent and independent variables?

  41. Regression with dummy independent variables • Independent variable: Gender • 1= female, 0 = male • If coefficient estimated of gender is a positive value, dependent variable is the direction increase with female • If coefficient estimated of gender is negative value, dependent variable is the direction increase with male.

  42. Practice • dataspss-s4.2 • Develop hypothesis • Develop regression model • Identify dependent variable • Identify independent variables • Discussion on output

  43. Samples of hypotheses • An increase in education does not cause a rise in the earning • People’s earning is not positively influenced by their age • There is not a significant relationship between the earning and the gender

More Related