260 likes | 511 Views
PUAF 610 TA. Session 10. TODAY. Ideas about Final Review Regression Review. Final Review. Any idea about the final review next week? Go over lectures Go over problem sets that related to the exam Go over extra exercises Try to get information from instructors Email me your preferences.
E N D
PUAF 610 TA Session 10
TODAY • Ideas about Final Review • Regression Review
Final Review • Any idea about the final review next week? • Go over lectures • Go over problem sets that related to the exam • Go over extra exercises • Try to get information from instructors • Email me your preferences
Regression • In regression analysis we analyze the relationship between two or more variables. • The relationship between two or more variables could be linear or non linear. • Simple Linear Regression y, x • Multiple Regression y, x1, x2, x3,…, xk • If there exist a relationship, how could we use this relationship to forecast future.
Regression Regression Dependent variable Independent variable (x) • Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables. • Regression is thus an explanation of causation.
Simple Linear Regression Regression y’ = b0 + b1X ± є є Dependent variable (y) B1 = slope = ∆y/ ∆x b0 (y intercept) Independent variable (x) • The output of a regression is a function that predicts the dependent variable based upon values of the independent variables. • Simple regression fits a straight line to the data.
Simple Linear Regression Regression Observation: y ^ Prediction: y Dependent variable Zero Independent variable (x) The function will make a prediction for each observed data point. The observation is denoted by y and the prediction is denoted by y. ^ For each observation, the variation can be described as: y = y + ε Actual = Explained + Error ^
Simple Linear Regression • Simple Linear Regression Model y = 0 + 1x+ • Simple Linear Regression Equation E(y) = 0 + 1x • Estimated Simple Linear Regression Equation y = b0 + b1x ^
Simple Linear Regression • The simplest relationship between two variables is a linear one: • y = 0 + 1x • x = independent or explanatory variable (“cause”) • y = dependent or response variable (“effect”) • 0 = intercept (value of y when x = 0) • 1 = slope (change in y when x increases one unit)
Interpret the slope • Y=0.3+2.6x
Regression Regression Dependent variable Independent variable (x) • A least squares regression, or OLS, selects the line with the lowest total sum of squared prediction errors. • This value is called the Sum of Squares of Error, or SSE.
Calculating SSR Regression Population mean: y Dependent variable Independent variable (x) The Sum of Squares Regression (SSR) is the sum of the squared differences between the prediction for each observation and the population mean.
Regression Regression Formulas The Total Sum of Squares (SST) is equal to SSR + SSE. Mathematically, SSR = ∑ ( y – y ) (measure of explained variation) SSE = ∑ ( y – y ) (measure of unexplained variation) SST = SSR + SSE = ∑ ( y – y ) (measure of total variation in y) ^ 2 ^ 2
The proportion of total variation (SST) that is explained by the regression (SSR) is known as the Coefficient of Determination, and is often referred to as R . R = = The value of R can range between 0 and 1, and the higher its value the more accurate the regression model is. 2 2 SSR SSR SST SSR + SSE 2 The Coefficient of Determination Regression
Testing for Significance • To test for a significant regression relationship, we must conduct a hypothesis test to determine whether the value of b1 is zero. • t Test is commonly used.
Testing for Significance: t Test • Hypotheses H0: 1 = 0 Ha: 1 = 0 • Test Statistic • Rejection Rule: Reject H0 if t < -tor t > t where tis based on a t distribution with n - 2 degrees of freedom.
Multiple Linear Regression Multiple Regression • More than one independent variable can be used to explain variance in the dependent variable. • A multiple regression takes the form: • y = A + β X + β X + … + β k Xk + ε • where k is the number of variables, or parameters. 1 1 2 2
Regression • A unit rise in x produces 0.4 of a unit rise in y, with z held constant. • Interpretation of the t-statistics remains the same, i.e. 0.4-0/0.4=1 (critical value is 2.02), so we fail to reject the null and x is not significant. • The R-squared statistic indicates 30% of the variance of y is explained.
Adjusted R-squared Statistic • This statistic is used in a multiple regression analysis, because it does not automatically rise when an extra explanatory variable is added. • Its value depends on the number of explanatory variables. • It is usually written as (R-bar squared):
Adjusted R-squared • It has the following formula (n-number of observations, k-number of parameters):
F-test of explanatory power • This is the F-test for the goodness of fit of a regression and in effect tests for the joint significance of the explanatory variables. • It is based on the R-squared statistic. • It is routinely produced by most computer software packages • It follows the F-distribution.
F-test formula • The formula for the F-test of the goodness of fit is:
F-statistic • When testing for the significance of the goodness of fit, our null hypothesis is that the explanatory variables jointly equal 0. • If our F-statistic is below the critical value we fail to reject the null and therefore we say the goodness of fit is not significant.