1 / 32

Understanding Confidence Intervals in Statistical Analysis

Learn how confidence intervals provide valuable estimates, testing hypotheses within a set significance level, ensuring accurate statistical conclusions.

lauren
Download Presentation

Understanding Confidence Intervals in Statistical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance level α (which is used to determine t*), we construct 100(1- α)% confidence intervals -Given random samples, 100(1- α)% of our confidence intervals contain the true value Bj -we don’t know whether an individual confidence interval contains the true value

  2. 4.3 Confidence Intervals -Confidence intervals are similar to 2-tailed tests in that α/2 is in each tail when finding t* -if our hypothesis test and confidence interval use the same α: • we can not reject the null hypothesis (at the given significance level) that Bj=aj if aj is within the confidence interval • we can reject the null hypothesis (at the given significance level) that Bj=aj if aj is not within the confidence interval

  3. 4.3 Confidence Example -Going back to our Pepsi example, we now look at geekiness: -From before our 2-sided t* with α=0.01 was t*=2.704, therefore our 99% CI is:

  4. 4.3 Confidence Intervals -Remember that a CI is only as good as the 6 CLM assumptions: • Omitted variables cause the estimates (Bjhats) to be unreliable -CI is not valid 2) If heteroskedasticity is present, standard error is not a valid estimate of standard deviation -CI is not valid 3) If normality fails, CI MAY not be valid if our sample size is too small

  5. 4.4 Complicated Single Tests -In this section we will see how to test a single hypothesis involving more than one Bj -Take again our coolness regression: -If we wonder if geekiness has more impact on coolness than Pepsi consumption:

  6. 4.4 Complicated Single Tests -This test is similar to our one coefficient tests, but our standard error will be different -We can rewrite our hypotheses for clarity: -We can reject the null hypothesis if the estimated difference between B1hat and B2hat is positive enough

  7. 4.4 Complicated Single Tests -Our new t statistic becomes: -And our test continues as before: 1) Calculate t 2) Pick α and calculate t* 3) Reject if t<t*

  8. 4.4 Complicated Standard Errors -The standard error in this test is more complicated than before -If we simply subtract standard errors, we may end up with a negative value -this is theoretically impossible -se must always be positive since it estimates standard deviations

  9. 4.4 Complicated Standard Errors -Using the properties of variances, we know that: -Where the variances are always added and the covariance always subtracted -transferring to standard deviation, this becomes: -Where s12 is an estimate of the covariance between coefficients -s12 can either be calculated using matrix algebra or be supplied by econometrics programs

  10. 4.4 Complicated Standard Errors -To see how to find this standard error, take our typical regression: -and consider the related equation where θ=B1-B2 or B1= θ+B2: -where x1 and x1 could be related concepts (ie: sleep time and naps) and x3 could be relatively unrelated (ie: study time)

  11. 4.4 Complicated Standard Errors -By running this new regression, we can find the standard error for our hypothesis test -using an econometric program is easier -Empirically: • B0 and se(B0) are the same for both regressions • B2 and B3 are the same for both regressions • Only B1 (the coefficient of θ) changes -given this new standard error, CI’s are created as normal

  12. 4.5 Testing Multiple Restrictions -Thus far we have tested whether a SINGLE variable is significant, or how two different variable’s impacts compare -In this section we will test whether a SET of variables are significant; have a partial effect on the dependent variable -Even though a group of variables may be individually insignificant, they may be significant as a group due to multicollinearity

  13. 4.5 Testing Multiple Restrictions -Consider our general true model and an example measuring reading week utility (rwu): -we want to test the hypothesis that B1 and B2 equal zero at the same time, that x1 and x1 have no partial effect simultaneously: -in our example, we are testing that positive activities have no effect on r.w. utility

  14. 4.5 Testing Multiple Restrictions -our null hypothesis had two EXCLUSION RESTRICTIONS -this set of MULTIPLE RESTRICTIONS is tested using a MULTIPLE HYPOTHESIS TEST or JOINT HYPOTHESIS TEST -the alternate hypothesis is unique: -note that we CANNOT use individual t tests to test this multiple restriction; we need to test the restriction jointly

  15. 4.5 Testing Multiple Restrictions -to test joint significance, we need to use SSR and R squared values obtained from two different regressions -we know that SSR increases and R2 decreases when variable are dropped from the model -in order to conduct our test, we need to regress two models: • An UNRESTRICTED model with all of the variables • A RESTRICTED MODEL that excludes the variables in the test

  16. 4.5 Testing Multiple Restrictions -Given a hypothesis test with q restrictions, we have the following regressions: -Where 4.34 is the UNRESTIRCTED MODEL giving us SSRu and 4.35 is the RESTRICTED MODEL giving us SSRr

  17. 4.5 Testing Multiple Restrictions -These SSR values combine to give us our F STATISTIC or TEST F STATISTIC: -Where q is the number of restrictions in the null hypothesis and q=numerator degrees of freedom -n-k-1=denominator degrees of freedom (the denominator is the unbiased estimator of σ2) -since SSRr≥SSRur, F is always positive

  18. 4.5 Testing Multiple Restrictions -Once can think of our test F stat as measuring the relative increase in SSR from moving from the unrestricted model to restricted -a large F indicates that the excluded variables have much explanatory power -using Ho and our CLM assumptions, we know that F has an F distribution with q, n-k-1 degrees of freedom: F~Fq, n-k-1 -we obtain F* from F tables and reject Ho if:

  19. 4.5 Multiple Example -Given our previous example of reading week utility, a restricted and unrestricted model give us: -Which correspond to the hypotheses:

  20. 4.5 Multiple Example -We use these SSR to construct a test statistic: -given α=0.05, F*2,569=3.00 -since F>F*, reject H0 at a 95% confidence level; positive activities have an impact on reading week utility

  21. 4.5 Multiple Notes -Once the degrees of freedom in F’s denominator reach about 120, the F distribution is no longer sensitive to it -hence the infinity entry in the F table -if H0 is rejected, the variables in question are JOINTLY (STATISTICALLY) SIGNIFICANT at the given alpha level -if H0 is not rejected the variables in question are JOINTLY INSIGNIFICANT at the alpha level -an F test can often be not rejected when individual t tests are rejected due to multicollinearity

  22. 4.5 F, t’s secret identity? -the F statistic can also be used to test significance of a single variable -in this case, q=1 -it can be shown that F=t2 in this case -or t2n-k-1 ~F1, n-k-1 -this only applies to two-sided tests -therefore t statistic is more flexible since it allows for one-sided tests -the t statistic is always best suited for testing a single hypothesis

  23. 4.5 F tests and abuse -we have already seen where individually insignificant variables may be jointly significant due to multicollinearity -a significant variable can also prove to be jointly insignificant if grouped with enough insignificant variables -an insignificant variable can also prove to be significant if grouped with significant variables -therefore t tests are much better than F tests at determining individual significance

  24. 4.5 R2 and F -While SSR can be large, R2 is bounded, often making it an easier way to calculate F: -Which is also called the R-SQUARED FORM OF THE F STATISTIC -since R2ur>R2r, F is still always positive -this form is NOT valid for testing all linear restrictions (as seen later)

  25. 4.5 F and p-values -similar to t-tests, F tests can produce p-values which are defined as: -the p-value is the “probability of observing a value of F at least as large as we did, given that the null hypothesis is true” -a small p-value is therefore evidence against H0 -as before, reject H0 if p>α -p-values can give us a more complete view of significance

  26. 4.5 Overall significance -Often it is valid to test if the model is significant overall -the hypothesis that NONE of the explanatory variables have an effect on y is given as: -as before with multiple restrictions, we compare against the restricted model:

  27. 4.5 Overall significance -Since our restricted model has no independent variables, its R2 is zero and our F formula simplifies to: -Which is only valid for this special test -this test determines the OVERALL SIGNIFICANCE OF THE REGRESSION -if this tests fails, we need to find other explanatory variables

  28. 4.5 Testing General Linear Restrictions -Sometimes economic theory (generally using elasticity) requires us to test complicated joint restrictions, such as: -Which expects our model: -To be of the form:

  29. 4.5 Testing General Linear Restrictions -We rewrite this expected model to obtain a restricted model: -We then calculate the F statistic using the SSR formula -note that since the dependent variable changes between the two models, the R2 F formula is not valid in this case -note that the number of restrictions (q) is simply equal to the number of = in the null hypothesis

  30. 4.6 Reporting Regression Results -When reporting single regressions, the proper reporting method is: -where R2, estimated coefficients, and N MUST be reported (note also the ^ and i’s) -either standard errors or t-values must also be reported (se is more robust for tests other than Bk=0) -SSR and standard error of the regression can also be reported

  31. 4.6 Reporting Regression Results -When multiple, related regressions are run (often to test for joint significance), the results can be expressed in table format, as seen on the next slide -whether a simple or table reporting method is done, the meanings and scaling of all the included variables must always be explained in a proper project Ie: price: average price, measured weekly, in American dollars College: Dummy Variable. 0 if no college education, 1 if college education

  32. 4.6 Reporting Regression Results

More Related