1 / 47

Intermediate Data Collection & Analysis

Intermediate Data Collection & Analysis. Steven A. Allshouse Coordinator of Research and Analysis November 5, 2008. Organization of the Class. Part I – Discussion of Correlation and Causation. Part II – Quantitative Examples of Correlation and Causation.

Download Presentation

Intermediate Data Collection & Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intermediate Data Collection & Analysis Steven A. Allshouse Coordinator of Research and Analysis November 5, 2008

  2. Organization of the Class • Part I – Discussion of Correlation and Causation. • Part II – Quantitative Examples of Correlation and Causation. • Part III – How to Measure Correlation (OLS Method). • Part IV – Common Pitfalls of the OLS Method. • Part V – MS Excel Exercise.

  3. Part I – Qualitative Examples of Correlationand Causation

  4. Correlation • A situation in which one variable or set of variables tends to be associated with a second variable or set of variables, but is not thought to bring about that second variable or set of variables. • Examples: The size of a person’s left foot and the size of his or her right foot; women’s hemlines and the performance of the stock market; and the number of cavities in elementary school children and the size of their vocabulary. • Note: Correlation can be positive or negative; positive means as X increases, so does Y; negative means as X increases, Y decreases.

  5. Causation • A situation in which one variable or set of variables is thought to bring about, or help bring about, a second variable or set of variables. • Examples: Alcohol consumption/traffic accidents; average daily temperatures/heating oil consumption. • Notes: Causation usually implies correlation; If X causes Y, where we see X we would expect to see Y. Causation can be positive or negative; an increase in X can cause an increase or a decrease in Y. The direction of causation can run one or both ways; X causes Y, but Y might or might not cause X.

  6. A Case of Causation? • There is a strong positive correlation between the number of fire engines that respond to a fire and the number of fatalities in that fire, i.e., the greater the number of fire engines, the greater the number of deaths. • Question: Does this fact mean that Albemarle County could save lives by decreasing the number of fire engines sent to a given fire?

  7. Additional Notes about Correlation & Causation • Direction of causation usually determines what we identify as “independent” and “dependent” variables; Independent variable X causes the dependent variable Y. X and Y are correlated, but Y does not cause X. • Identification problem: Smoke actually does not cause the fire alarm to be pulled; fire is the underlying cause. Similarly, an increase in, say, education can be seen as causing an increase in income, but educational attainment might just be a “signal” of some underlying ability.

  8. Part II – Quantitative Examples of Correlationand Causation

  9. Part III – How to Estimate Correlation

  10. Ordinary Least Squares (OLS) Method • OLS is mathematical technique that estimates the correlation between two or more variables. Usually, however, if we are measuring correlation, we already are assuming causation. • The OLS technique renders two items: • (1) A formula whose graphical representation (a “regression” or “trend” line) best “fits” the observed data; and • (2) A number (R2) whose value describes how “tightly” the data fits around the regression line.

  11. The “Regression” or “Trend” Line • Data is plotted in a “scatter” diagram. Horizontal line contains “x” values (independent variable) and vertical line contains “y” values (dependent variable). • Regression or Trend line is expressed in the form y = mx + b. • The terms “regression” line and “trend” line frequently are used interchangeably but, usually, a “trend” line pertains to data where the value of the dependent variable changes with time.

  12. The R2 Number • Has a value anywhere from Zero to 1. • An R2 value of zero means that there is absolutelyno correlation between the independent and dependent variables. • An R2 value of 1 means that there is a perfectly deterministiccorrelation between the independent and dependent variables. • The R2 number tells us how much changes in the dependent variable are “explained” by changes in the independent variable. • Example: If R2 equals 0.70, that means that 70% of the change in the dependent variable is “explained” by the change in the independent variable.

  13. Example of a Trend Line Analysis

  14. Part IV – Some Common Pitfalls of Regression / Trend Line Analysis

  15. Pitfall #1: The Regression or Trend Line that is derived from the OLS method might be meaningful only for a limited range of numbers. • Pitfall #2: The most valid Regression or Trend Line for a particular set of data might not necessarily be linear. • Pitfall #3: Usually, a dependent variable is a function of several independent variables, not just one independent variable.

  16. Questions?

  17. Part V – MS Excel Exercise

  18. Background • You work in the Planning Department; your boss comes to you with historical development data showing growth in the square footage of non-residential space. • An intern has compiled the data, and has calculated the square footage, by type of non-residential space, that has occurred during a twenty year time period. • The intern has taken the twenty year increase and divided that number by twenty in order to derive and average annual increase in each type of square footage. • Your boss has used this average annual increase to estimate the number of square feet, by non-residential type, that the County can expect over the course of the next ten years.

  19. Background (Cont.) • You are somewhat suspicious of the ten year projection for industrial space, since the County had a net loss of jobs in the manufacturing sector during the course of the twenty years. • Assignment: • (a) Take the historical data for the industrial square footage and use MS Excel to derive an OLS trend line that fits this data; • (b) Graph the trend line, the trend line equation, and the R2 value; and • (c) Using the trend line equation, project the total new industrial square footage that the County can expect during the course of the next ten years.

  20. Assignment (Cont.) • Question: Is your estimate different from the estimate that your boss derived? If so, how large is the gap (both in absolute square footage and percentage terms)? • How “tightly” does the data fit around the trend line that you have derived? Do you have much confidence in your trend line?

  21. Conclusion

More Related