100 likes | 114 Views
Explore Simple Linear Regression, uses, lurking variables, pitfalls, and residual plots in regression analysis for accurate predictions and detection of lurking variables.
E N D
Lecture 18 • Simple Linear Regression (Chapters 18.1-18.5)
Interaction plots in ANOVA • It is a good idea to always look at the interaction plots when doing a two-way ANOVA, regardless of whether or not the test for interactions is significant. Interaction plots display the basic results of study. • If there really are no interactions, then the interaction plots will consist of parallel lines.
Regression Analysis • The goal: Estimate E(Y|X) = conditional mean of Y given X based on a sample. • Simple Linear Regression: Assumes E(Y|X) is a straight line in X.
Uses of Regression Analysis • Descriptive. Describe the association between y and x in the population observed. • Passive prediction. Predict y based on x where you do not plan to manipulate x, e.g., predict today’s stock price based on yesterday’s stock price. • Control. Predict what y will be if you change x, e.g., predict what your earnings will be if you obtain different levels of education.
Lurking Variables • A lurking variable is a variable that has an important effect on the relationship among the variables in a study but is not included among the variables studied. • Examples: • Y=Salaries of Presbyterian Ministers over time, X=Price of rum in Havana over time, Lurking Variable = Inflation rate over time. • Y=Pellagra rate in village, X=Amount of flies in village, Lurking Variable = Amount of corn in diet.
Pitfalls in Regression Analysis • (1) Descriptive: If using simple linear regression, need to make sure E(Y|X) is actually approximately a straight line. • (2) Passive Prediction: Need to beware of pitfall for (1) plus extrapolation and lurking variables • Control: Need to beware of pitfalls for (1) and (2) plus extra caution about lurking variables. Requires a cause-and-effect relationship. Best found through a controlled experiment.
Example of Pitfall • A researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets?
Residual Plots Against Time • Many lurking variables change systematically over time. • Useful method for detecting lurking variables: Plot residuals against time order of observation is available. If a systematic pattern is found, an understanding of the background of the data might allow you to guess what the lurking variables are. • Another useful residual plot: Plot residuals vs. location of observations.
Residual Plot vs. Time Example • Goal: Predict elementary mathematics enrollment (X) at college based on number of freshman students (Y). Linear Fit Math enrollment = -283.8905 + 0.6511345 Freshman students
Residual Plot vs. Time • Residual plot suggests that a change took place between 1994 and 1995 that caused higher proportion of students to take math courses. • In fact, one of schools in university changed its program in 1995 to require entering students to take another math course. • Conclusion: The math dept. shouldn’t use data from before 1995 for predicting future enrollment.