1 / 61

Prof. Vera Adamchik

QMS 6351 Statistics and Research Methods Analyzing the Relationship Between Two and More Variables Chapter 2.4 Chapter 3.5 Chapter 14 (14.1-14.3, 14.6) Chapter 15 (15.1-15.3, 15.7). Prof. Vera Adamchik. Chapter 2 Section 2.4 Crosstabulations and Scatter Diagrams. Crosstabulations.

balin
Download Presentation

Prof. Vera Adamchik

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. QMS 6351Statistics and Research Methods Analyzing the Relationship Between Two and More VariablesChapter 2.4Chapter 3.5Chapter 14 (14.1-14.3, 14.6)Chapter 15 (15.1-15.3, 15.7) Prof. Vera Adamchik

  2. Chapter 2 Section 2.4 Crosstabulations and Scatter Diagrams

  3. Crosstabulations • Crosstabulation is a method that can be used to summarize the data for two variables simultaneously. • Typically, the table’s left and top margin labels define the classes for the two variables. • Crosstabulation can provide insight about the relationship between the variables.

  4. Crosstabulations • Crosstabulation of Enrollment by Gender and Degree Level at a University Degree Level Gender Undergraduate Graduate Doctorate Total Male 7341 (47.0%) 1937 (53.4%) 172 (59.1%) 9450 (48.3%) Female 8294 (53.0%) 1688 (46.6%) 119 (40.9%) 10101 (51.7%) Total 15635 (100.0%) 3625 (100.0%) 291(100.0%)19551 (100.0%)

  5. Scatter Diagram • A scatter diagram is a graphical presentation of the relationship between two quantitative variables.

  6. Scatter Diagram • Scatter Diagram for Engine Size and Gas Mileage of Eight Automobiles 30 25 20 In-City Gas Mileage (mpg) 15 10 0 2 4 6 8 10 Engine Size (number of cylinders)

  7. Example: Reed Auto Sales Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales showing the number of TV ads run and the number of cars sold in each sale are shown below. Develop a scatter diagram.

  8. Example (cont.) Number of TV Ads Number of Cars Sold 1 14 3 24 2 18 1 17 3 27

  9. Chapter 3 Section 3.5 Measures of Association Between Two Variables • Covariance • Correlation Coefficient

  10. Covariance is a descriptive measure of the linear association between two variables. • The value of covariance depends upon units of measurement. • A measure of the relationship between two variables that avoids this difficulty is the correlation coefficient.

  11. Covariance • If the data sets are samples, the covariance is denoted by sxy. • If the data sets are populations, the covariance is denotedby .

  12. Example:Reed Auto Sales Sample covariance = 20/4 = 5 (autos*tv ads)

  13. Correlation Coefficient • If the data sets are samples, the correlation coefficient is denoted by rxy. • If the data sets are populations, the correlation coefficient is denoted byrxy .

  14. Correlation Coefficient • The coefficient can take on values between -1 and +1. • If r orrare near -1, it indicates a strong negative linear relationship. • If r orrare near +1, it indicates a strong positive linear relationship.

  15. Example:Reed Auto Sales s2x = 4/4 = 1; sx = 1. s2y = 114/4 = 28.5; sy = 5.3385. Correlation coefficient = rxy = 5/(1*5.3385) = 0.936586. A strong positive linear relationship.

  16. If r orr= 1, it is a case of perfect positive linear correlation (all points are on a positively sloped straight line). • If r orr= -1, it is a case of a perfect negative linear correlation (all points are on a negatively sloped straight line). • If r orr= 0, there is no linear correlation between the two variables (the points are scattered all over the diagram).

  17. We would like to find an analytical/mathematical expression (a formula) for the relationship between TV ads and auto sales. • Both a scatter diagram and correlation coefficient suggest that there is a linear relationship between TV ads and auto sales.

  18. Chapter 14 Outline • The simple linear regression model • The Least Squares Method • The coefficient of determination

  19. Regression analysis • Regression analysis is a description or the study of the nature of the relationship between variables (for example, linear regression, non-linear regression, simple regression, multiple regression).

  20. Functional vs. stochastic relationship • Functional (deterministic) relationship: the variables are perfectly related; the relationship is true for each/any observation. For example, the area of a square in mathematics, total revenue in economics. • Statistical (stochastic) relationship: the variables are not perfectly related, the relationship is true on average, not for each observation. For example, MPC in economics.

  21. The simple linear regression • The simple linear regression model is a mathematical way of stating the linear statistical relationship between two variables. • The variable being predicted is called the dependent variable. • The variable being used to predict the value of the dependent variable is called the independent variable.

  22. Regression equation • Regression equation – the equation that describes how the mean value(that is, on average) of the dependent variable (y) is related to the independent variable(s) (x). • Simple Linear Regression Equation E(y) = 0 + 1x 0 and1 are referred to as the parameters of the model.

  23. Regression model • Regression model – the equation that describes how the dependent variable is related to the independent variable(s) and an error term. • Simple Linear Regression Model y = 0 + 1x+   (the Greek letter epsilon) is a random variable referred to as the error term. It absorbs the impact of all other variables on y.

  24. Estimated regression equation • We will use a sample to estimate the population parameters 0 and1 . Sample statistics (denoted b0 and b1) serve as estimates of0 and1 . Substituting the values of b0andb1 in the regression equation, we obtain the estimated regression equation. • Estimated Simple Linear Regression Equation y = b0 + b1x y is the mean value of y for a given value of x. ^ ^

  25. The Least Squares Method ^ • Least Squares Criterion min S(yi - yi)2 where yi = observed value of the dependent variable for the i th observation yi = estimated value of the dependent variable for the i th observation ^

  26. The Least Squares Method • Slope for the Estimated Regression Equation This formula appears in the footnote on p. 568 • y -Intercept for the Estimated Regression Equation b0 = y - b1x _ _

  27. Example: Reed Auto Sales • Slope for the Estimated Regression Equation b1 = 220 - (10*100)/5 = 5 24 - (10)2/5 • y -Intercept for the Estimated Regression Equation b0 = 20 - 5(2) = 10 • Estimated Regression Equation y = 10 + 5x ^

  28. Interpretation • bo is theexpected value of ywhen x=0. (May be meaningless). In our example, when the number of TV ads is zero, the expected number of cars sold is 10. • b1 is thechange in the expected value of ywhen x changes by 1 unit of its measurement, ceteris paribus. In our example, when the number of TV ads increases by 1, the number of cars sold is expected to increase by 5 cars.

  29. SST, SSR, SSE • Relationship Among SST, SSR, SSE SST = SSR + SSE Variation in Y due to X Total variation in Y Variation in Y due to all other factors

  30. Coefficient of Determination • Coefficient of determination represents the proportion of SST that is explained by the use of the regression model. • Coefficient of Determination: r 2 = SSR/SST 0  r 2 1

  31. Example: Reed Auto Sales • Coefficient of Determination r 2 = SSR/SST = 100/114 = .877193 The regression relationship is very strong since 87.7% of the variation in number of cars sold can be explained by the linear relationship between the number of TV ads and the number of cars sold.

  32. The Correlation Coefficient • The correlation coefficient measures the strength of the linear association between two variables. • The sample correlation coefficient is plus or minus the square root of the coefficient of determination. • Sample Correlation coefficient: • = 0.936586 sign of b1

  33. Chapter 15 Outline • The multiple linear regression model • The Least Squares Method • The multiple coefficient of determination • Categorical independent variables

  34. Multiple Regression Equation: • Multiple Regression Model • Estimated Multiple Regression Equation:

  35. Multiple coefficient of determination R2 = SSR/SST Adjusted multiple coefficient of determination: where p is the number of independent variables.

  36. Example: Programmer Salary Survey • A software firm collected data for a sample of 20 computer programmers. A suggestion was made that regression analysis could be used to determine if salary was related to the years of experience and the score on the firm’s programmer aptitude test. • The years of experience, score on the aptitude test test, and corresponding annual salary ($1000s) for a sample of 20 programmers is shown on the next slide.

  37. Test Score Exper. (Yrs.) Exper. (Yrs.) Salary ($000s) Salary ($000s) Test Score 4 7 1 5 8 10 0 1 6 6 78 100 86 82 86 84 75 80 83 91 9 2 10 5 6 8 4 6 3 3 88 73 75 81 74 87 79 94 70 89 38.0 26.6 36.2 31.6 29.0 34.0 30.1 33.9 28.2 30.0 24.0 43.0 23.7 34.3 35.8 38.0 22.2 23.1 30.0 33.0

  38. Suppose we believe that salary (y) is related to the years of experience (x1) and the score on the programmer aptitude test (x2) by the following regression model: where y = annual salary ($000), x1 = years of experience, x2 = score on programmer aptitude test.

  39. Solving for the Estimates of β0, β1, β2 • Excel’s Regression Equation Output Note: Columns F-I are not shown.

  40. Estimated Regression Equation SALARY = 3.174 + 1.404(EXPER) + 0.251(SCORE) Note: Predicted salary will be in thousands of dollars.

  41. Interpreting the Coefficients In multiple regression analysis, we interpret each regression coefficient as follows: bi represents an estimate of the change in y corresponding to a 1-unit increase in xiwhen all other independent variables are held constant.

  42. Interpreting the Coefficients b1 = 1.404 Salary is expected to increase by $1,404 for each additional year of experience (when the variable score on programmer attitude test is held constant).

  43. Interpreting the Coefficients b2 = 0.251 Salary is expected to increase by $251 for each additional point scored on the programmer aptitude test (when the variable years of experience is held constant).

  44. Multiple Coefficient of Determination • Excel’s ANOVA Output SSR SST

  45. Multiple Coefficient of Determination R2 = SSR/SST R2 = 500.3285/599.7855 = .83418

More Related