420 likes | 429 Views
Stat 31, Section 1, Last Time. 2-way tables Testing for Independence Chi-Square distance between data & model Chi-Square Distribution Gives P-values (CHIDIST) Simpson’s Paradox: Lurking variables can reverse comparisons Recall Linear Regression Fit a line to a scatterplot.
E N D
Stat 31, Section 1, Last Time • 2-way tables • Testing for Independence • Chi-Square distance between data & model • Chi-Square Distribution • Gives P-values (CHIDIST) • Simpson’s Paradox: • Lurking variables can reverse comparisons • Recall Linear Regression • Fit a line to a scatterplot
Recall Linear Regression Idea: Fit a line to data in a scatterplot Recall Class Example 14 https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls • To learn about “basic structure” • To “model data” • To provide “prediction of new values”
Inference for Regression Goal: develop • Hypothesis Tests and Confidence Int’s • For slope & intercept parameters, a & b • Also study prediction
Inference for Regression Idea: do statistical inference on: • Slope a • Intercept b Model: Assume: are random, independent and
Inference for Regression Viewpoint: Data generated as: y = ax + b Yi chosen from Xi Note: a and b are “parameters”
Inference for Regression Parameters and determine the underlying model (distribution) Estimate with the Least Squares Estimates: and (Using SLOPE and INTERCEPT in Excel, based on data)
Inference for Regression Distributions of and ? Under the above assumptions, the sampling distributions are: • Centerpoints are right (unbiased) • Spreads are more complicated
Inference for Regression Formula for SD of : • Big (small) for big (small, resp.) • Accurate data Accurate est. of slope • Small for x’s more spread out • Data more spread More accurate • Small for more data • More data More accuracy
Inference for Regression Formula for SD of : • Big (small) for big (small, resp.) • Accurate data Accur’te est. of intercept • Smaller for • Centered data More accurate intercept • Smaller for more data • More data More accuracy
Inference for Regression One more detail: Need to estimate using data For this use: • Similar to earlier sd estimate, • Except variation is about fit line • is similar to from before
Inference for Regression Now for Probability Distributions, Since are estimating by Use TDIST and TINV With degrees of freedom =
Inference for Regression Convenient Packaged Analysis in Excel: Tools Data Analysis Regression Illustrate application using: Class Example 27, Old Text Problem 8.6 (now 10.12)
Inference for Regression Class Example 27, Old Text Problem 8.6 (now 10.12) Utility companies estimate energy used by their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:
Inference for Regression Data for October through June are:
Inference for Regression Class Example 27, Old Text Problem 8.6 (now 10.12) Excel Analysis: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg27.xls Good News: Lots of things done automatically Bad News: Different language, so need careful interpretation
Inference for Regression Excel Glossary:
Inference for Regression Excel Glossary:
Inference for Regression Excel Glossary:
Inference for Regression Excel Glossary:
Inference for Regression Some useful variations: Class Example 28, Old Text Problems 10.8 - 10.10 (now 10.13 – 10.15) Excel Analysis: https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:
Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: The data are:
Inference for Regression Class Example 28, (now 10.13 – 10.15) Old 10.8: • Plot the data, does the trend in lean over time appear to be linear? • What is the equation of the least squares fit line? • Give a 95% confidence interval for the average rate of change of the lean. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
Inference for Regression HW: 10.3 b,c 10.5
And Now for Something Completely Different Etymology of: “And now for something completely different” Anybody heard of this before?
And Now for Something Completely Different What is “etymology”? Google responses to: define: etymology • The history of words; the study of the history of words.csmp.ucop.edu/crlp/resources/glossary.html • The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm
And Now for Something Completely Different What is “etymology”? • Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm
And Now for Something Completely Different Google response to: define: and now for something completely different And Now For Something Completely Different is a film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different
And Now for Something Completely Different Google Search for: “And now for something completely different” Gives more than 100 results…. A perhaps interesting one: http://www.mwscomp.com/mpfc/mpfc.html
And Now for Something Completely Different Google Search for: “Stat 31 and now for something completely different” Gives: [PPT]Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for somethingcompletelydifferent… Review Ideas on State Lotteries,. from our study of Expected Value ...https://www.unc.edu/~marron/ UNCstat31-2005/Stat31-05-03-31.ppt - Similar pages
Prediction in Regression Idea: Given data Can find the Least Squares Fit Line, and do inference for the parameters. Given a new X value, say , what will the new Y value be?
Prediction in Regression Dealing with variation in prediction: Under the model: A sensible guess about , based on the given , is: (point on the fit line above )
Prediction in Regression What about variation about this guess? Natural Approach: present an interval (as done with Confidence Intervals) Careful: Two Notions of this: • Confidence Interval for mean of • Prediction Interval for value of
Prediction in Regression • Confidence Interval for mean of : Use: where: and where
Prediction in Regression Interpretation of: • Smaller for closer to • But never 0 • Smaller for more spread out • Larger for larger
Prediction in Regression • Prediction Interval for value of Use: where: And again
Prediction in Regression Interpretation of: • Similar remarks to above … • Additional “1 + ” accounts for added variation in compared to
Prediction in Regression Revisit Class Example 28, (now 10.13 – 10.15) Old 10.8: Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…
Prediction in Regression Class Example 28, (now 10.13 – 10.15) Old 10.9: • Plot the data, Does the trend in lean over time appear to be linear? • What is the equation of the least squares fit line? • Give a 95% confidence interval for the average rate of change of the lean. https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
Prediction in Regression HW: 10.20 and add part: (f) Calculate a 95% Confidence Interval for the mean oxygen uptake of individuals having heart rate 96, and heart rate 115.
Additional Issues in Regression Robustness Outliers via Java Applet HW on outliers