190 likes | 380 Views
Welcome Back!. EDUC 7610. Chapter 2. The Simple Regression Model. Fall 2018 Tyson S. Barrett, PhD. Let’s start with Scatterplots. Each point represents a single observation The red line is the line of best fit The line happens to go through each Conditional Mean
E N D
Welcome Back!
EDUC 7610 Chapter 2 The Simple Regression Model Fall 2018 Tyson S. Barrett, PhD
Let’s start with Scatterplots • Each point represents a single observation • The redline is the line of best fit • The line happens to go through each Conditional Mean • It goes through the mean at each value of x • E.g. When x = 1, mean of y = 2.5 (the conditional mean of y at x = 1 is 2.5)
Conditional Means and Prediction • The open circles are where the Conditional Means are • In this case, all conditional means run along the line • When this happens (or approx. happens) we have linearity • The line is the linear model’s predicted level of y for each level of x
Why is that line the “best”? That line is the line that minimizes the error between the predicted values and the observed values i.e., “residual” or “error” This approach is called Ordinary Least Squares (OLS) regression
Features of the “Best” Line (Simple Regression) Slope = Intercept = The Line () =
The “Best” Line and Correlation is only affected by variables that influence both X and Y while is affected by variables that only influence Y We unstandardized the by has no scale but is in the units of the outcome is affected by the range of the variables measured is the effect of X on Y while is the relative importance of X on Y
We unstandardized the by That is, is the standardized version of If we standardize our variables before using regression, both a are the same Why?
has no scale but is in the units of the outcome has a range of -1 to 1 is in the range of the outcome (approximately), often is from – to “For a one unit increase in X there is an associated increase of units in the outcome”
is affected by the range of the variables measured The value of is not affected by the range of X (the significance is…) is affected by having a less-than-representative range of X Why?
is only affected by variables that influence both X and Y while is affected by variables that only influence Y is the effect of X on Y while is the relative importance of X on Y • is a measure of relative importance compared to other variables • If other variables are important, will be relatively smaller • is a measure of the effect of X on Y and therefore shouldn’t change much based on the range of X • The standard error is affected though (we’ll discuss later)
Back to Residuals The estimate of depends on minimizing the residuals so they are kind of a big deal
Back to Residuals Our values can be separated into three parts: The same for everyone (a constant) Unexplained component (residuals) Explained component
Back to Residuals Our values can be separated into three parts: The same for everyone (a constant) Unexplained component (residuals) Explained component
Properties of the Residuals The mean is exactly zero. The correlation with X is exactly zero. The variance is: The proportion of variance in Y not explained by X
Properties of the Residuals The mean is exactly zero. The correlation with X is exactly zero. The variance is: is the proportion of variance in Y explained by X The proportion of variance in Y not explained by X
Residuals tell us stuff Partial relationships because the residual is what is remaining in Y after adjusting for X Residual analysis to detect anomalies Detect non-linearities Assess the homoskedasticity assumption