250 likes | 404 Views
Lecture 3. HSPM J716. Efficiency in an estimator. Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance -- worthless. A no-variance, reliable estimator?. The 0 estimator. Eyeball vs. Least squares for assignment 1.
E N D
Lecture 3 HSPM J716
Efficiency in an estimator • Efficiency = low bias and low variance • Unbiased with high variance – not very useful • Biased with low variance -- worthless
A no-variance, reliable estimator? • The 0 estimator
Eyeball vs. Least squares for assignment 1 • http://hspm.sph.sc.edu/COURSES/J716/demos/StudentLines/StudentLines.html
Hypothesis testing – parallels among the coin toss, card trick, and assignment 1A experiments • A statistic calculated from our data • A critical value for that statistic calculated theoretically based on a hypothesis about how the data were generated • If our statistic were greater than the critical value, we would reject the hypothesis.
Hypothesis testing – all about calculating the probability of what you got and drawing an inference • With the coin toss experiment • A statistic calculated from our data • Counted how many tails came up • A critical value for that statistic calculated theoretically based on the hypothesis that the coin was fair • 5 consecutive results that are all the same • When our statistic was greater than the critical value, we rejected the hypothesis
Hypothesis testing – all about calculating the probability of what you got and drawing an inference • With the card experiment • A statistic calculated from our data • Counted how many times I guessed the card • A critical value for that statistic calculated theoretically based on the hypothesis that the any of 52 cards could come up • Even one right guess has a probability less than 0.05, so the critical value is 1. • When our statistic was as big as the critical value, we rejected the hypothesis
T statistic hypothesis tests calculate a probability and draw an inference • With the assignment 1A spreadsheet • A statistic calculated from our data • The estimated coefficient divided by its standard error • A critical value for that statistic calculated theoretically based on the hypothesis that the true line’s slope is 0. • 2.571 • When our statistic is greater than the critical value, we reject the hypothesis
Not rejecting a false hypothesisType II error in assignment 1A part 2
How the assumptions apply to the eyeball line and the least squares line
Assumption 1 is that there is a true line and that what you see differs from the true line because of random errors up or down for each point. • Eyeball line: It's why you drew a line through the points, instead of using a curve or a wiggly line that goes from one point to the next. • Least squares: It’s why you built a spreadsheet that calculates the slope and intercept of a line.
Assumption 2 is that the errors have an expected value of 0. • Eyeball line: it's why you try to draw the line through the middle of the points, rather than off to one side or tilting differently. • Least squares: The average of the residuals is 0. • (The residuals are your estimates of the errors.)
Assumption 3 is that the errors all have the same variance. • Eyeball line: It's why you don't favor one point over another in drawing the line. • Least squares: The spreadsheet’s sum and average rows are simples sums and averages. No data row gets a different weight from another.
Assumption 4 is that the errors are independent, not correlated with each other. • Eyeball line: It's why you predict for X=800 using a point on the line • Least squares: Its why you predict for X=800 with 800*slope + intercept.
Confidence interval for a coefficient • Coefficient ± its standard error × t from table • 95% probability that the true coefficient is in the 95% confidence interval? • If you do a lot of studies, you can expect that, for 95% of them, the true coefficient will be in the 95% confidence interval. • If 0 is in the confidence interval, then the coefficient is not significant.
Assignment 2 • All regression results are the same • Graphs differ • Need reason to use or doubt least squares prediction • The reason is in the form of rejecting one or more of the assumptions
Durbin-Watson statistic • Serial correlation • Finds significant pattern for clinic 2
Confidence interval for prediction • The hyperbolic outline
Formal outlier test? • Use confidence interval of prediction • With and without the suspect point? • How do you predict when your data have an outlier? • Totally ignoring it seems wrong. • So does letting it sway your results too much. • Investigate and use judgment.
Multiple regression • 3 or more dimensions • 2 or more X variables • Y = α + βX + γZ + error • Y = α + β1X1 + β2X2 + … + βpXp+ error
Fitting a plane in 3D space • Linear assumption • Now a flat plane • The effect of a change in X1 on Y is the same at all levels of X1 and X2 and any other X variables. • Residuals are vertical distances from the plane to the data points floating in space.
Multiple regression • Separating effects • Example from literature • Example from handout
β interpretation • in Y = α + βX + γZ + error • β is the effect on Y of changing X by 1, holding Z constant. • When X is one unit bigger than you would predict it to be from what Z is, then we expect Y to be β more than what you would predict it would be from what Z is. • Those prediction are based on linear relationships.
LS • Spreadsheet as front end • Word processor as back end • Interpretation of results • Coefficients • Standard errors • T-statistics • P-values • Prediction