1 / 41

Time Series Analysis – Chapter 4 Hypothesis Testing

Time Series Analysis – Chapter 4 Hypothesis Testing.

thai
Download Presentation

Time Series Analysis – Chapter 4 Hypothesis Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Time Series Analysis – Chapter 4Hypothesis Testing Hypothesis testing is basic to the scientific method and statistical theory gives us a way of conducting tests of scientific hypotheses. Scientific philosophy today rests on the idea of falsification: For a theory to be a valid scientific theory it must be possible, at least in principle, to make observations that would prove the theory false. For example, here is a simple theory: All swans are white

  2. Time Series Analysis – Chapter 4Hypothesis Testing All swans are white This is a valid scientific theory because there is a way to falsify it: I can observe one black swan and the theory would fall. For more information on the history and philosophy of falsification I suggest reading Karl Popper.

  3. Time Series Analysis – Chapter 4Hypothesis Testing Besides the idea of falsification, we must keep in mind the other basic tenant of the scientific method: All evidence that supports a theory or falsifies it must be empirically based and reproducible.

  4. All evidence that supports a theory or falsifies it must be empirically based and reproducible. In other words, data! Just holding a belief (no matter how firm) that a theory is true or false is not a justifiable stance. This chapter gives us the most basic statistical tools for taking data or empirical evidence and using it to substantiate or nullify (show to be false) a hypothesis.

  5. All evidence that supports a theory or falsifies it must be empirically based and reproducible. I have just used the word hypothesis and this chapter is concerned with hypothesis testing, not theory testing. This is because theories are composed of many hypotheses and, usually, a theory is not directly supported or attacked but one or more of it’s supporting hypotheses are scrutinized.

  6. Discrimination or Not Activity Null Hypothesis Ho: No Discrimination Alternative Hypothesis Ha: Discrimination How do we choose which hypothesis to support?

  7. Discrimination or Not Activity Null Hypothesis Ho: No Discrimination Alternative Hypothesis Ha: Discrimination How do we choose which hypothesis to support?

  8. The p-value p-value measures amount of support for alternative hypothesis. The smaller the p-value the more support for the alternative hypothesis. Typical level of support is 5% or 0.05

  9. Fourth Graders Feet Data Set Predictor variable (x): Childs Age Response variable (y): Foot Length Model: Test: Ho: -> x has no effect on y H1: -> x has an effect on y

  10. Fourth Graders Feet Data Set – Minitab Output Predictor variable (x): Childs Age Response variable (y): Foot Length The regression equation is Foot Length = 18.1 + 0.0358 Childs Age Predictor Coef SE CoefT P Constant 18.138 3.753 4.83 0.000 Childs Age 0.03575 0.02922 1.22 0.229

  11. Fourth Graders Feet Data Set – Minitab Output Predictor variable (x): Childs Age Response variable (y): Foot Length Ho: -> x has no effect on y H1: -> x has an effect on y P-value = 0.229 -> x has no STATISTICAL effect on y given the model we used!

  12. Fourth Graders Feet Data Set – One Tailed Alternative Predictor variable (x): Age Response variable (y): foot length Ho: -> x has no effect on y H1: -> x has a positive effect on y P-value = (0.229)/2 = 0.1145 -> x has no statistical positive effect on y given the model we used.

  13. Statistical vs. Practical Significance 401K data set Predictor variables x1: mrate x2: age x3: totemp Response variable (y): prate

  14. Statistical vs. Practical Significance The regression equation is: prate = 80.3 + 5.44 mrate + 0.269 age - 0.000130 totemp Predictor Coef SE CoefT P Constant 80.2943 0.7777 103.25 0.000 mrate5.4414 0.5244 10.38 0.000 age 0.26941 0.04515 5.97 0.000 totemp -0.00012978 0.00003672 -3.53 0.000

  15. Statistical vs. Practical Significance The regression equation is: prate = 80.3 + 5.44 mrate + 0.269 age - 0.000130 totemp All predictors are statistically significant.

  16. Statistical vs. Practical Significance The regression equation is: prate = 80.3 + 5.44 mrate + 0.269 age - 0.000130 totemp If total number of employees increases by ten thousand then participation rate decreases by -0.000130*10,000 = 1.3% (other predictors held constant)

  17. Boeing 747 Jet What does an empty Boeing 747 jet weigh?

  18. Boeing 747 Jet What does an empty Boeing 747 jet weigh? My point estimate: 250,000 lbs Answer: 358,000 lbs I am wrong! A point estimate is almost always wrong!

  19. Boeing 747 Jet What does an empty Boeing 747 jet weigh? My confidence interval estimate: (0, ∞) Answer: 358,000 lbs I am right! But, my interval is not useful!

  20. Point and Interval Estimates – Minitab will compute both 401K data set Predictor variables x1: age Response variable (y): prate In Minitab go to Regression -> General Regression and select the correct model variables then click on the Results box and make sure the “Display confidence intervals” box is selected.

  21. Point and Interval Estimates – Minitab will compute both 401K data set Predictor variables x1: age Response variable (y): prate Regression Equation prate = 83.4231 + 0.298893 age Coefficients TermCoefSE CoefT P 95% CI Constant 83.4231 0.737593 113.102 0.000 (81.9763, 84.8699) age 0.2989 0.045938 6.506 0.000 ( 0.2088, 0.3890)

  22. Confidence Intervals General structure of all confidence intervals: The standard error is an estimate of the standard deviation of the point estimator.

  23. Confidence Intervals General structure of all confidence intervals: TermCoef SE Coef T P 95% CI Constant 83.4231 0.737593 113.102 0.000 (81.9763, 84.8699) age 0.2989 0.045938 6.506 0.000 ( 0.2088, 0.3890) 0.2989 + 1.960*0.045938 = 0.3890 0.2989 – 1.960*0.045938 = 0.2088

  24. Confidence Intervals General structure of all confidence intervals: 0.2989 + 1.960*0.045938 = 0.3890 0.2989 – 1.960*0.045938 = 0.2088 Where does 1.960 come from?

  25. Confidence Intervals Where does 1.960 come from? t distribution with n – k – 1 degrees of freedom where k is the number of predictors in the model. For our model, n = 1533 and k = 1 We also need to know the confidence level of the interval (typically 95%) Then, use a t table!

  26. Testing Linear Combinations of Parameters TWOYEAR data set Predictor variables x1: jc – # years attending a two-year college x2: univ – # years attending a four-year college x3: exper – months in workforce Response variable (y): log(wage)

  27. Testing Linear Combinations of Parameters Predictor variables x1: jc – # years attending a two-year college x2: univ – # years attending a four-year college x3: exper – months in workforce Response variable (y): log(wage) Ho: “one year at two-year college is worth the same as one year at four-year college”

  28. Testing Linear Combinations of Parameters Predictor variables x1: jc – # years attending a two-year college x2: univ – # years attending a four-year college x3: exper – months in workforce Response variable (y): log(wage) H1: “one year at two-year college is worth less than one year at four-year college”

  29. Testing Linear Combinations of Parameters Predictor variables x1: jc – # years attending a two-year college x2: univ – # years attending a four-year college x3: exper – months in workforce Response variable (y): log(wage) Ho: -> Ho: H1: -> H1:

  30. Testing Linear Combinations of Parameters Let -> Then

  31. Testing Linear Combinations of Parameters Now, after we create the new variable , we can conduct the following test: Ho: -> Ho: -> Ho: H1: -> H1: -> H1:< 0

  32. Ho: -> Ho: -> Ho: H1: -> H1: -> H1:< 0 lwage = 1.47233 - 0.0101795 jc + 0.0768763 jc+univ + 0.00494422 exper Coefficients TermCoef SE CoefT P 95% CI Constant 1.47233 0.0210602 69.9102 0.000 ( 1.43104, 1.51361) jc -0.01018 0.0069359 -1.4677 0.142 (-0.02378, 0.00342) jc+univ 0.07688 0.0023087 33.2981 0.000 ( 0.07235, 0.08140) exper0.00494 0.0001575 31.3972 0.000 ( 0.00464, 0.00525)

  33. Ho: -> Ho: -> Ho: H1: -> H1: -> H1:< 0 This is a one-tailed test so the p-value needs to be divided by 2: 0.142/2 = 0.071 Conclusion: analysis supports the null hypothesis – “one year at a junior college is worth the same as one year at a university.”

  34. Testing Linear Combinations of Parameters Use the TWOYEAR data set to test the following hypothesis: Ho: H1:

  35. The ANOVA F Test For a multiple regression model: The ANOVA F test is: Ho: H1: at least one is not equal to 0

  36. Multiple Linear Regression Assumptions MLR Assumption 1: the model is linear in the parameters

  37. Multiple Linear Regression Assumptions MLR Assumption 2: Data comes from a random sample

  38. Multiple Linear Regression Assumptions MLR Assumption 3: None of the independent or predictor variables are perfectly correlated (if they were, Minitab would not run a regression analysis).

  39. Multiple Linear Regression Assumptions MLR Assumption 4: The error, u, has an expected value of zero.

  40. Multiple Linear Regression Assumptions MLR Assumption 5: The error, u, has the same variance given any values of the explanatory variables. This is the assumption of homoskedasticity.

  41. Multiple Linear Regression Assumptions MLR Assumption 6: The error, u, is independent of the explanatory or predictor variables and is normally distributed with mean zero and variance .

More Related