1 / 57

Statistics and Quantitative Analysis U4320

Statistics and Quantitative Analysis U4320. Segment 8 Prof. Sharyn O’Halloran. I. Introduction. A. Overview 1. Ways to describe, summarize and display data. 2.Summary statements: Mean Standard deviation Variance 3. Distributions Central Limit Theorem. I. Introduction (cont.).

yardley
Download Presentation

Statistics and Quantitative Analysis U4320

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics and Quantitative Analysis U4320 Segment 8 Prof. Sharyn O’Halloran

  2. I. Introduction • A. Overview • 1. Ways to describe, summarize and display data. • 2.Summary statements: • Mean • Standard deviation • Variance • 3. Distributions • Central Limit Theorem

  3. I. Introduction (cont.) • A. Overview • 4. Test hypotheses • 5. Differences of Means • B. What's to come? • 1. Analyze the relationship between two or more variables with a specific technique called regression analysis.

  4. I. Introduction (cont.) • A. Overview • B. What's to come? • 2. This tools allows us to predict the impact of one variable on another. • For example, what is the expected impact of a SIPA degree on income?

  5. II. Causal Models • Causal models explain how changes in one variable affect changes in another variable. Incinerator -------------------------> Bad Public Health Regression analysis gives us a way to analyze precisely the cause-and-effect relationships between variables. • Directional • Magnitude

  6. II. Causal Models (cont.) • A. Variables • Let us start off with a few basic definitions. • 1. Dependent Variable • The dependent variable is the factor that we want to explain. • 2. Independent Variables • Independent variable is the factor that we believe causes or influences the dependent variable. Independent variable-------> Dependent Variable Cause ------------------> Effect

  7. II. Causal Models (cont.) • A. Variables • B. Voting Example • Let us say that we have a vote in the House of Representatives on health. And we want to know if party affiliation influenced individual members' voting decisions? • 1. The raw data looks like this:

  8. II. Causal Models (cont.) • A. Variables • B. Voting Example • 2. Percentages look like this: • 3. Does party affect voting behavior? • Given that the legislator is a Democrat, what is the chance of voting for the health care proposal?

  9. II. Causal Models (cont.) • A. Variables • B. Voting Example • 3. Does party affect voting behavior? (cont.) • What is the Probability of being a democrat? • What is the Probability of being a Democrat and voting yes?

  10. II. Causal Models (cont.) • A. Variables • B. Voting Example • 4. Casual Model • This is the simplest way to state a causal model A-------------> B Party ---------> Vote • 5. Interpretation • The interpretation is that if party influences vote, then as we move from Republicans to Democrats we should see a move from a No vote to a YES vote.

  11. II. Causal Models (cont.) • A. Variables • B. Voting Example • C. Summary • 1. Regression analysis helps us to explain the impact of one variable on another. • We will be able to answer such questions as what is the relative importance of race in explaining one's income? • Or perhaps the influence of economic conditions on the levels of trade barriers?

  12. II. Causal Models (cont.) • A. Variables • B. Voting Example • C. Summary • 2. Univariate Model • For now, we will focus on the univariate case, or the causal relation between two variables. • We will then relax this assumption and look at the relation of multiple variables in a couple of weeks.

  13. III. Fitted Line • Although regression analysis can be very complicated, the heart of it is actually very simple. • It centers on the notion of fitting a line through the data. • 1. Example • Suppose we have a study of how wheat yield depends on fertilizer. And we observe this relation:

  14. III. Fitted Line (cont.) • 1. Example (cont.) • The observed relation between Fertilizer and Yield then can be plotted as follows:

  15. III. Fitted Line (cont.) • 1. Example • 2. What line best approximates the relation between these observations? • a) Highest and Lowest Value

  16. III. Fitted Line (cont.) • 1. Example • 2. What line best approximates the relation between these observations? (cont.) • b) Median Value

  17. III. Fitted Line (cont.) • 1. Example • 2. What line best approximates the relation between these observations? • 3. Predicted Values • a) Example 1: • The line that is fitted to the data gives the predicted value of Y for any give level of X.

  18. III. Fitted Line (cont.) • 1. Example • 2. What line best approximates the relation between these observations? • 3. Predicted Values (cont.) • a) Example 1: • If X is 400 and all we know was the fitted line then we would expect the yield to be around 65.

  19. III. Fitted Line (cont.) • 1. Example • 2. What line best approximates the relation between these observations? • 3. Predicted Values (cont.) • b) Example 2: • Many times we have a lot of data and fitting the line becomes rather difficult.

  20. III. Fitted Line (cont.) • 1. Example • 2. What line best approximates the relation between these observations? • 3. Predicted Values (cont.) • b) Example 2: • For example, if our plotted data looked like this:

  21. IV. OLS Ordinary Least Squares • We want a methodology that allows us to be able to draw a line that best fits the data. • A. The Least Square Criteria • What we want to do is to fit a line whose equation is of the form: • This is just the algebraic representation of a line.

  22. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria (cont.) • 1. Intercept: • a represents the intercept of the line. That is, the point at which the line crosses the Y axis. • 2. Slope of the line: • b represents the slope of the line.

  23. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria (cont.) • 1. Intercept: • 2. Slope of the line: • Remember: the slope is just the change in Y divided by the change in X. Rise/Run • 3. Minimizing the Sum or Squares • a) Problem: • How do we select a and b so that we minimize the pattern of vertical Y deviations (predicted errors)? • We what to minimize the deviation:

  24. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria (cont.) • 1. Intercept: • 2. Slope of the line: • 3. Minimizing the Sum or Squares • b)There are several ways in which we can do this. • 1. First, we could minimize the sum of d. • We could find the line that will give us the lowest sum of all the d's. • The problem of course is that some d's would be positive and others would be negative and when we add them all up they would end up canceling each other. • In effect, we would be picking a line so that the d's add up to zero.

  25. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria (cont.) • 1. Intercept: • 2. Slope of the line: • 3. Minimizing the Sum or Squares • b)There are several ways in which we can do this. • 2. Absolute Values • 3. Sum of Squared Deviations

  26. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • 1. Fitted Line • The line that we what to fit to the data is: • This is simply what we call the OLS line. • Remember: we are concerned with how to calculate the slope of the line b and the intercept of the line

  27. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • 1. Fitted Line • 2. OLS Slope • The OLS slope can becalculated from the formula:

  28. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • 1. Fitted Line • 2. OLS Slope • In the book they use the abbreviations:

  29. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • 1. Fitted Line • 2. OLS Slope • 3. Intercept • Now that we have the slope b it is easy to calculate a • Note: when b=0 then the intercept is just the mean of the dependent variable.

  30. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield

  31. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • So to calculate the slope we solve: • We can then use the slope b to calculate the intercept

  32. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • Remember: • Plugging these estimated values into our fitted line equation, we get:

  33. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • What is the predicted bushels produced with 400 lbs of fertilizer? • What if we add 700 lbs of fertilizer what would be the expected yield?

  34. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • 1. Slope b • Change in Y that accompanies a unit change X. • The slope tells us that when there is a one unit change in the independent variable what is the predicted effect on the dependent variable?

  35. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • 1. Slope b • The slope then tells us two things: • i) The directional effect of the independent variable on the dependent variable. • There was a positive relation between fertilizer and yield.

  36. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • 1. Slope b • The slope then tells us two things: • ii) It also tells you the magnitude of the effect on the dependent variable. • For each additional pound of fertilizer we expect an increased yield of .059 bushels.

  37. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • 2. The Intercept • The intercept tells us what we would expect if there is no fertilizer added, we expect a yield of 36.4 bushels. • So independent of the fertilizer you can expect 36.4 bushels. • Alternatively, if fertilizer has no effect on yield, we would simply expect 36.4 bushels. The yield we expected with no fertilizer.

  38. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • 1. Casual Model • We want to know if exposure to radio active waste is linked to cancer? Radio Active Waste --------------> Cancer

  39. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • 2. Data

  40. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • 3. Graph

  41. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • 4. Calculate the regression line for predicting Y from X • i) Slope • How do we interpret the slope coefficient? • For each unit of radioactive exposure, the cancer mortality rate rises by 9.03 deaths per 10,000 individuals.

  42. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • ii) Calculate the intercept • Plugging these estimated values into our fitted line equation, we get:

  43. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • 5. Predictions: • Let's calculate the mortality rate if X were 5.0. • How about if X were 0?

  44. IV. OLS Ordinary Least Squares (cont.) • A. The Least Square Criteria • B. OLS Formulas • C. Example 1: Fertilizer and Yield • D. Interpretation of b and a • E. Example II: Radio Active Exposure • How can we interpret this result? • Even with no radioactive exposure, the mortality rate would be 118.5.

  45. III. Advantages of OLS • A. Easy • 1. The least square method gives relative easy or at least computable formulas for calculating a and b.

  46. III. Advantages of OLS (cont.) • A. Easy • B. OLS is similar to many concepts we have already used. • 1. We are minimizing the sum of the squared deviations. In effect, this is very similar to how we find the variance. • 2. Also, we saw above that when b=0, • The interpretation of this is that the best prediction we can make of Y is just the sample mean . • This is the case when the two variables are independent.

  47. III. Advantages of OLS (cont.) • A. Easy • B. OLS is similar to many concepts we have already used. • C. Extension of the Sample Mean • Since OLS is just an extension of the sample mean, it has many of the same properties like efficient and unbiased. • D. Weighted Least Squares • We might want to weigh some observations more heavily than others.

  48. V. Homework Example • In the homework assignment, you are asked to select two interval/ratio level variables and calculate the fitted line that minimizes the sum of the squared deviations (the regression line). • A. Choose 2 Variables • What effect does the number of years of education have on the frequency that one reads the newspaper? • The independent variable is Education • And the dependent variable is Newspaper reading.

  49. V. Homework Example(cont.) • A. Choose 2 Variables • B. Coding the Variables • First, I made a new variable called PAPER. • Recode all the missing data values to a single value. • Remove missing values from the data set. • Then do the same for education

  50. V. Homework Example(cont.) • A. Choose 2 Variables • B. Coding the Variables • C. Getting the number of valid observations • Next, see how many valid observations are left by using the “Summarize” command under the “Data” menu.

More Related