Lecture 5 Regression

Lecture 5Regression

Homework Issues…past • Bad Objective: Conduct an experiment because I have to for this class • Commas – ugh  • Do not write out symbols (‘pi’), use the symbol (‘p’) • Summarize results (don’t give me everything and then some) • Report: mean ± std. dev.

Homework Issues…present • A confidence interval should be reported as an interval, e.g., 1.2 – 1.5 • Define abbreviations when first used, e.g., CI • However, there were too many conjunctive adverbs at the start of sentences! • Equation formatting

Homework Issues…present • Do not show 27 digits of accuracy • UNITS!!! UNITS!!! INCLUDE UNITS!!! • Every table and figure should have a caption and be referred to in the text. • A section (e.g., results) should be more than just a table.

On to the lecture…

In Excel… • three ways to perform a linear regression: • Built-in functions SLOPE() and INTERCEPT() -- no details • Adding a trendline to a chart, and showing the regression equation on the chart (simplest) • Regression analysis using the Data Analysis Toolkit (best option – more information)

Option 3 in Excel

Excel Results • Recall that we forced the intercept = 0

Interpretation of results… • Excel reports the Standard Error, not the standard deviation. They are not equal. See next slide. • The P-value is the probability that the observed result could be explained by random chance. The tiny P-value for the slope (1.91 x 10-25) indicates that there is a miniscule probability that the observed result can be explained by random chance. That is, you REALLY NEED the slope term to explain the data.

Interpretation of results… • The 95% confidence interval for the true value of the slope (true value of π in this example) is presented in the output table. In this example, with 95% confidence, the true value of π is somewhere between 3.138 and 3.307. • The 90% confidence interval is 3.15233 to 3.292408, which does not contain the true value!! Measurement bias – not small, random, additive error?

Calculating std. dev. • Slope se =0.0405 • Slope sd = 0.0405 ·sqrt(20) = 0.181 • Our experimental results are: • “The experimental value of π was found to be 3.22 ± 0.181.” • “The 95% confidence interval for true value of π ranges from 3.138 to 3.307.”

Multivariable Regression • Fit this data to an equation of the form:

Plot

Multivariable Regression • y is the response variable. • Order of the other columns does not matter.

In Excel…

Results… (bug?)

Interpretation… • The coefficients ± s are: • b0 = 5.53 ± 20.45 • b1 = 2.12 ± 8.54 • b2 = 3.98 ± 0.78 • Standard deviations are significantly larger than the mean values for b0 and b1. • p-values for these coefficients are 0.42 and 0.45. • These p-values are well over 0.05, so these terms are statistically insignificant (at 5%.) We can regress this data nearly as well with:

p-value? • Recall: The lower the p-value, the less likely the result, assuming the null hypothesis, so the more "significant" the result, in the sense of statistical significance. • The null hypothesis here is, simplistically, that the coefficient is zero.

t-Test on a Regression Slope • Comparison of b1 from regression with another value, b. • The t-test is a hypothesis test. Here are the hypotheses for this t-test. • H0 (null hypothesis) – The slope, b1, is equal to the known value, β. • H1 (test hypothesis) – The slope, b1, is not equal to the known value, β.

t-Statistic • The appropriate t-statistic for this case is calculated as • where • The t statistic is always positive; you may have to use (β-b1) to get a positive value.

Critical t Value • If tstat > tcrit – Reject the null hypothesis that the slope, b1, is equal to the known value, β. • If tstat ≤ tcrit – Fail to reject the null hypothesis. • Get tcrit from a t-Table or Excel (see example). • degrees of freedom, DOF = N-2

Example • We are comparing b1 = 3.22 (first example in lecture) to b = p. • Get SSE = 85.954 from regression output. • Calculate: tstat = 0.952 • Choose α = 0.05. • DOF = 20 – 2 = 18. • In Excel, calculate TINV(α,DOF), which returns the value tcrit=2.101 when α = 0.05 and DOF = 18 • Since tstat ≤ tcrit (0.952 < 2.101) we fail to reject the null hypothesis. • Conclusion? We cannot say with 95% confidence that b1 is not equal to b.

Example • Choose α = 0.40. • DOF = 20 – 2 = 18. • In Excel, calculate TINV(α,DOF), which returns the value tcrit=0.86 when α = 0.40 and DOF = 18 • Since tcirt ≤ tstat we reject the null hypothesis. • Conclusion? We can say with 60% confidence that b1 is not equal to b. • Hmmm…that’s a coin flip.

Lecture 5 Regression