1 / 42

Stat 31, Section 1, Last Time

Stat 31, Section 1, Last Time. 2-way tables Testing for Independence Chi-Square distance between data & model Chi-Square Distribution Gives P-values (CHIDIST) Simpson’s Paradox: Lurking variables can reverse comparisons Recall Linear Regression Fit a line to a scatterplot.

beckyc
Download Presentation

Stat 31, Section 1, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 31, Section 1, Last Time • 2-way tables • Testing for Independence • Chi-Square distance between data & model • Chi-Square Distribution • Gives P-values (CHIDIST) • Simpson’s Paradox: • Lurking variables can reverse comparisons • Recall Linear Regression • Fit a line to a scatterplot

  2. Recall Linear Regression Idea: Fit a line to data in a scatterplot Recall Class Example 14 https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls • To learn about “basic structure” • To “model data” • To provide “prediction of new values”

  3. Inference for Regression Goal: develop • Hypothesis Tests and Confidence Int’s • For slope & intercept parameters, a & b • Also study prediction

  4. Inference for Regression Idea: do statistical inference on: • Slope a • Intercept b Model: Assume: are random, independent and

  5. Inference for Regression Viewpoint: Data generated as: y = ax + b Yi chosen from Xi Note: a and b are “parameters”

  6. Inference for Regression Parameters and determine the underlying model (distribution) Estimate with the Least Squares Estimates: and (Using SLOPE and INTERCEPT in Excel, based on data)

  7. Inference for Regression Distributions of and ? Under the above assumptions, the sampling distributions are: • Centerpoints are right (unbiased) • Spreads are more complicated

  8. Inference for Regression Formula for SD of : • Big (small) for big (small, resp.) • Accurate data  Accurate est. of slope • Small for x’s more spread out • Data more spread  More accurate • Small for more data • More data  More accuracy

  9. Inference for Regression Formula for SD of : • Big (small) for big (small, resp.) • Accurate data  Accur’te est. of intercept • Smaller for • Centered data  More accurate intercept • Smaller for more data • More data  More accuracy

  10. Inference for Regression One more detail: Need to estimate using data For this use: • Similar to earlier sd estimate, • Except variation is about fit line • is similar to from before

  11. Inference for Regression Now for Probability Distributions, Since are estimating by Use TDIST and TINV With degrees of freedom =

  12. Inference for Regression Convenient Packaged Analysis in Excel: Tools  Data Analysis  Regression Illustrate application using: Class Example 27, Old Text Problem 8.6 (now 10.12)

  13. Inference for Regression Class Example 27, Old Text Problem 8.6 (now 10.12) Utility companies estimate energy used by their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:

  14. Inference for Regression Data for October through June are:

  15. Inference for Regression Class Example 27, Old Text Problem 8.6 (now 10.12) Excel Analysis: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg27.xls Good News: Lots of things done automatically Bad News: Different language, so need careful interpretation

  16. Inference for Regression Excel Glossary:

  17. Inference for Regression Excel Glossary:

  18. Inference for Regression Excel Glossary:

  19. Inference for Regression Excel Glossary:

  20. Inference for Regression Some useful variations: Class Example 28, Old Text Problems 10.8 - 10.10 (now 10.13 – 10.15) Excel Analysis: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

  21. Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:

  22. Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: The data are:

  23. Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: • Plot the data, does the trend in lean over time appear to be linear? • What is the equation of the least squares fit line? • Give a 95% confidence interval for the average rate of change of the lean. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

  24. Inference for Regression HW: 10.3 b,c 10.5

  25. And Now for Something Completely Different Etymology of: “And now for something completely different” Anybody heard of this before?

  26. And Now for Something Completely Different What is “etymology”? Google responses to: define: etymology • The history of words; the study of the history of words.csmp.ucop.edu/crlp/resources/glossary.html • The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm

  27. And Now for Something Completely Different What is “etymology”? • Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm

  28. And Now for Something Completely Different Google response to: define: and now for something completely different And Now For Something Completely Different is a film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different

  29. And Now for Something Completely Different Google Search for: “And now for something completely different” Gives more than 100 results…. A perhaps interesting one: http://www.mwscomp.com/mpfc/mpfc.html

  30. And Now for Something Completely Different Google Search for: “Stat 31 and now for something completely different” Gives: [PPT]Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for somethingcompletelydifferent… Review Ideas on State Lotteries,. from our study of Expected Value ...https://www.unc.edu/~marron/ UNCstat31-2005/Stat31-05-03-31.ppt - Similar pages

  31. Prediction in Regression Idea: Given data Can find the Least Squares Fit Line, and do inference for the parameters. Given a new X value, say , what will the new Y value be?

  32. Prediction in Regression Dealing with variation in prediction: Under the model: A sensible guess about , based on the given , is: (point on the fit line above )

  33. Prediction in Regression What about variation about this guess? Natural Approach: present an interval (as done with Confidence Intervals) Careful: Two Notions of this: • Confidence Interval for mean of • Prediction Interval for value of

  34. Prediction in Regression • Confidence Interval for mean of : Use: where: and where

  35. Prediction in Regression Interpretation of: • Smaller for closer to • But never 0 • Smaller for more spread out • Larger for larger

  36. Prediction in Regression • Prediction Interval for value of Use: where: And again

  37. Prediction in Regression Interpretation of: • Similar remarks to above … • Additional “1 + ” accounts for added variation in compared to

  38. Prediction in Regression Revisit Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…

  39. Prediction in Regression Class Example 28, (now 10.13 – 10.15) Old 10.9: • Plot the data, Does the trend in lean over time appear to be linear? • What is the equation of the least squares fit line? • Give a 95% confidence interval for the average rate of change of the lean. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls

  40. Prediction in Regression HW: 10.20 and add part: (f) Calculate a 95% Confidence Interval for the mean oxygen uptake of individuals having heart rate 96, and heart rate 115.

  41. Additional Issues in Regression Robustness Outliers via Java Applet HW on outliers

More Related