460 likes | 630 Views
MGMT 276: Statistical Inference in Management Spring , 2013. Welcome. Statistical Inference in Management. Instructor: Suzanne Delaney, Ph.D. Office: 405 “N” McClelland Hall. Phone: 621-2045. Email: delaney@u.arizona.edu. Office hours: 2:00 – 3:30 Mondays and Fridays and by appointment.
E N D
MGMT 276: Statistical Inference in ManagementSpring, 2013 Welcome
Statistical Inference in Management Instructor:Suzanne Delaney, Ph.D. Office:405 “N” McClelland Hall Phone:621-2045 Email:delaney@u.arizona.edu Office hours:2:00 – 3:30Mondays and Fridays and by appointment
Readings for next exam (Exam 4: April 25th) Lind Chapter 13: Linear Regression and Correlation Chapter 14: Multiple Regression Chapter 15: Chi-Square Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
Exam 4 – Optional Times for Final • Two options for completing Exam 4 • Thursday (4/25/13) • Tuesday (4/30/13) • Must sign up to take Exam 4 on Tuesday (4/23) • Only need to take one exam – these are two optional times
Use this as your study guide Over next couple of lectures 4/16/13 • Logic of hypothesis testing with Correlations • Interpreting the Correlations and scatterplots • Simple Regression • Using correlation for predictions • Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)Coefficient of correlation is name for “r”Coefficient of determination is name for “r2”(remember it is always positive – no direction info)Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation) • Coefficient of regression will “b” for each variable (like slope
Five steps to hypothesis testing Step 1: Identify the research problem (hypothesis) Describe the null and alternative hypotheses For correlation null is that r = 0 (no relationship) Step 2: Decision rule • Alpha level? (α= .05 or .01)? • Critical statistic (e.g. critical r) value from table? • Degrees of Freedom = (n – 2) df = # pairs - 2 Step 3: Calculations Step 4: Make decision whether or not to reject null hypothesis If observed r is bigger then critical r then reject null Step 5: Conclusion - tie findings back in to research problem
.92 3 270 .878 yes 240 210 yes Weekly Pay 180 The relationship between 150 the hours worked and weekly pay is a strong positive correlation. 120 This correlation is significant, r(3) = 0.92; p < 0.05 90 30 35 5 20 10 25 15 Hours Worked 29241 2565 225 41616 4080 400 24336 3120 400 68121 9135 1225 885 171,963 19,830 100 2,350 171,963 (5) (19,830) – (100)(885) =.9199711 10,650 (41.83)(276.75) (5) 171,963 – (885)2 (5) 2,350 – (100)2
-.73 3 400 3 380 .878 no 360 Wait Time no 340 The relationship between 320 wait time and number of operators working is negative and strong.. 300 This correlation is not significant, r(3) = 0.73; p < 0.05 280 7 8 6 5 4 Hours Worked 1675 112225 25 2298 36 146689 2408 49 118336 2304 64 82944 30 608,419 10,225 190 1735 (5) (10,225) – (30)(1735) =-.73278 -925 (7.071)(178.52) (5) (608,419 – (1735)2 (5)(190) – (30)2
Homework due – Thursday (April 18th) On class website: Please print and complete homework worksheet #21 Hypothesis testing with ANOVA – Original Research Hand in yourhomework Please click in My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z Please double check – All cell phones other electronic devices are turned off and stowed away
Correlation: Independent and dependent variables • When used for prediction we refer to the predicted variable • as the dependent variable and the predictor variable as the independent variable What are we predicting? What are we predicting? Dependent Variable Dependent Variable Independent Variable Independent Variable
What are we predicting? Correlation Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Temperatures by time spent outside in Tucson in summer Temperature Negative Correlation Timeoutside
What are we predicting? Correlation Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Height by average driving speed Height Zero Correlation Average Speed
What are we predicting? Correlation Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Amount Healthtex spends per month on advertising by sales in the month Amountof sales Positive Correlation Amount spent On Advertising
What are we predicting? Correlation Positive correlation: as values on one variable go up, so do values for the other variable Negative correlation: as values on one variable go up, the values for the other variable go down Yearly income by expenses per year YearlyIncome Positive Correlation Expenses per year
YearlyIncome Expenses per year Correlation - What do we need to define a line If you probably make this much Y-intercept = “a” (also “b0”)Where the line crosses the Y axis Slope = “b” (also “b1”)How steep the line is If you spend this much • The predicted variable goes on the “Y” axis and is called the dependent variable • The predictor variable goes on the “X” axis and is called the independent variable
Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values • These Y values are normally distributed. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal.
Yearly Income Yearly Income YearlyIncome Expenses per year Expenses per year Expenses per year Correlation - What do we need to define a line Y-intercept = “a”Where the line crosses the Y axis Slope = “b” How steep the line is Y-intercept is good…slope is wrong Y-intercept is wrong…slope is good
BrushingTeeth BrushingTeeth BrushingTeeth NumberCavities NumberCavities NumberCavities Correlation - What do we need to define a line Y-intercept = “a”Where the line crosses the Y axis Slope = “b” How steep the line is Y-intercept is wrong…slope is good Y-intercept is good…slope is wrong
5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities Correlation - let’s do another one Does brushing your teeth correlate with fewer cavities? Step 1: Draw scatterplot Step 2: Data table X Y . 1 5 3 4 2 3 3 2 . 5 1 . Step 3: Estimate r and prediction line Step 4: Find r
Find r r = -0.85
X Y . 1 5 3 4 2 3 3 2 . 5 1 . Draw a scatterplot
Draw a scatterplot X Y . 1 5 3 4 2 3 3 2 . 5 1 .
r = -0.85 b= - 0.91(slope) a= 5.5 (intercept) Draw a regression line and regression equation
Prediction line Y’ = a+ b1X1 Y’ = 842 + (-37.5)X1 Interpreting regression equation Y-intercept a) Interpret the slope of the fitted regression line:Sales = 842 – 37.5 Price Slope Notice in this case it is negative A slope of “37.5” suggests that raising “price” by 1 unit will reduce “sales” by 37.5 units b) If “price” = 20, what is the prediction for “Sales”?Sales = 842 – 37.5 Price Sales = 842 - 37.5 Price Sales = 842 - (37.5) (20) Sales = 842 - (37.5) (20) = 842 – 750 = 92 Sales price of product
Prediction line Y’ = a+ b1X1 Y’ = 842 + (-37.5)X1 Interpreting regression equation Y-intercept a) Interpret the slope of the fitted regression line:Sales = 842 – 37.5 Price Slope A slope of “37.5” suggests that raising “price” by 1 unit will reduce “sales” by 37.5 units b) If “price” = 20, what is the prediction for “Sales”?Sales = 842 – 37.5 Price Sales = 842 - 37.5 Price Sales = 842 - (37.5) (20) Sales = 842 - (37.5) (20) = 842 – 750 = 92 (20, 92) Sales probablyabout 92 units Sales price of product If Price = 20
Prediction line Y’ = a+ b1X1 Y’ = 2.277 + (.0307)X1 Interpreting regression equation a) The regression equation: NetIncome = 2,277 + .0307 Revenue Interpret the slope Y-intercept Slope Notice in this case it is positive A slope of “.0307” suggests that raising “Revenue” by 1 dollar, NetIncome will raise by 3 cents b) If “Revenue” = 1,000, what is the prediction for “NetIncome”? NetIncome = 2,277 + .0307 Revenue NetIncome = 2,277 + (.0307 )(1,000) NetIncome = 2,277 + 30.7 = 2,307.7 (1,000, 2,307.7) NetIncome Revenue
Prediction line Y’ = a+ b1X1 Y’ = 2,277 + (.0307)X1 Interpreting regression equation a) The regression equation: NetIncome = 2,277 + .0307 Revenue Interpret the slope Y-intercept Slope A slope of “.0307” suggests that raising “Revenue” by 1 dollar, NetIncome will raise by 3 cents b) If “Revenue” = 1,000, what is the prediction for “NetIncome”? NetIncome will be about 2,307.70 NetIncome = 2,277 + .0307 Revenue NetIncome = 2,277 + (.0307 )(1,000) NetIncome = 2,277 + 30.7 = 2,307.7 (1,000, 2,307.7) NetIncome Revenue If Revenue = 1000
Prediction line Y’ = a+ b1X1 Other Problems Cost will be about 95.06 Cost Y-intercept The expected cost for dinner for two couples (4 people) would be $95.06Cost = 15.22 + 19.96 Persons People Slope If People = 4 If “Persons” = 4, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (4) Cost = 15.22 + 79.84 = 95.06 If “Persons” = 1, what is the prediction for “Cost”? Cost = 15.22 + 19.96 Persons Cost = 15.22 + 19.96 (1) Cost = 15.22 + 19.96 = 35.18
Prediction line Y’ = a+ b1X1 Other Problems Rent will be about 990 Cost Y-intercept Slope Square Feet If SqFt = 800 The expected cost for rent on an 800 square foot apartment is $990Rent = 150 + 1.05 SqFt If “SqFt” = 800, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (800) Rent = 150 + 840 = 990 If “SqFt” = 2500, what is the prediction for “Rent”? Rent = 150 + 1.05 SqFt Rent = 150 + 1.05 (2500) Rent = 150 + 840 = 2,775
Prediction line Y’ = a+ b1X1 Frequency of Teeth brushing will be about Other Problems Y-intercept If number of cavities = 3 Slope The expected frequeny of teeth brushing for having one cavity is Frequency of teeth brushing= 5.5 + (-.91) Cavities If “Cavities” = 3, what is the prediction for “Frequency of teeth brushing”? Frequency of teeth brushing= 5.5 + (-.91) Cavities Frequency of teeth brushing= 5.5 + (-.91) (3) Frequency of teeth brushing= 5.5 + (-2.73) = 2.77 (3.0, 2.77)
Draw a regression line and regression equation Prediction line Y’ = b1X1+ b0 Y’ = (-.91)X 1+ 5.5 b0 = 5.5 (intercept) b1 = - 0.91(slope) r = - 0.85
5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities Correlation - let’s predict how often they brushed their teeth Find prediction line Y’ = b1 X + b0 Y’ = (-0.91) X + 5.5 Plot line - predict Y’ from X - Pick an X Let’s try X of 1 Y’ = (-0.91) 1 + 5.5 = 4.59 (plot 1,4.59) Let’s try X of 5 - Pick another X Y’ = (-0.91) 5 + 5.5 = 0.95 (plot 5,0.95)
5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities X Y . 1 5 3 4 2 3 3 2 5 1 r = -0.85 b1 = - 0.91 b0 = 5.5 Y’ = b1 X + b0 Y’ = (-0.91) X + 5.5 Y’ = (-0.91) 1 + 5.5 = 4.59 Y’ = (-0.91) 3 + 5.5 = 2.77 Y’ = (-0.91) 2 + 5.5 = 3.68 Y’ = (-0.91) 3 + 5.5 = 2.77 Y’ = (-0.91) 5 + 5.5 = .95
5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities Prediction line Y’ = b1X 1+ b0 Y’ = (-.91)X 1+ 5.5 Correlation - Evaluating the prediction line Does the prediction line perfectly predict the Ys from the Xs? No, let’s see How much “error” is there? Exactly? Residuals The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions
Correlation The more closely the dots approximate a straight line,(the less spread out they are) the stronger the relationship is. Perfect correlation = +1.00 or -1.00 One variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line Any Residuals?
5 4 3 Number of times per day teeth are brushed 2 1 0 0 1 2 3 4 5 Number of cavities A note about curvilinear relationships and patterns of the residuals How well does the prediction line predict the Ys from the Xs? Residuals • Shorter green lines suggest better prediction – smaller error • Longer green lines suggest worse prediction – larger error • Why are green lines vertical? • Remember, we are predicting the variable on the Y axis • So, error would be how we are wrong about Y (vertical)
5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities How well does the prediction line predict the Ys from the Xs? Residuals • Slope doesn’t give “variability” info • Intercept doesn’t give “variability info • Correlation “r” does give “variability info • Residuals do give “variability info
Sound familiar?? What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate (line) Standard error of the estimate: • a measure of the average amount of predictive error • the average amount that Y’ scores differ from Y scores • a mean of the lengths of the green lines
5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities Correlation - let’s predict how often they brushed their teeth Find prediction line Y’ = b1 X + b0 Y’ = (-0.91) X + 5.5 Plot line - predict Y’ from X - Pick an X Let’s try X of 1 Y’ = (-0.91) 1 + 5.5 = 4.59 (plot 1,4.59) Let’s try X of 5 - Pick another X Y’ = (-0.91) 5 + 5.5 = 0.95 (plot 5,0.95)
X Y Y’ Y-Y’. 1 5 4.59 0.41 3 4 2.77 1.23 2 3 3.68 -0.68 3 2 2.77 -0.77 5 1 0.95 0.05 A note on Rounding Errors 5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities r = -0.85 b1 = - 0.91 b0 = 5.5 .41 Y’ = b1 X + b0 Y’ = (-0.91) X + 5.5 1.23 -.68 Y’ = (-0.91) 1 + 5.5 = 4.59 0.05 -.77 Y’ = (-0.91) 3 + 5.5 = 2.77 Y’ = (-0.91) 2 + 5.5 = 3.68 Y’ = (-0.91) 4+ 5.5 = 1.86 Y’ = (-0.91) 5 + 5.5 = .95 These are our “predicted values” for each X score
X Y Y’ Y-Y’. (Y-Y’)2 1 5 4.59 0.41 0.168 3 4 2.77 1.23 1.513 2 3 3.68 -0.68 0.462 3 2 2.77 -0.77 0.593 5 1 0.95 0.05 .0025 5 4 Number of times per day teeth are brushed 3 2 1 0 0 1 2 3 4 5 Number of cavities r = -0.85 b1 = - 0.91 b0 = 5.5 2.739 .41 Y’ = b1 X + b0 Y’ = (-0.91) X + 5.5 1.23 -.68 Y’ = (-0.91) 1 + 5.5 = 4.59 0.05 -.77 Y’ = (-0.91) 3 + 5.5 = 2.77 Y’ = (-0.91) 2 + 5.5 = 3.68 Y’ = (-0.91) 4+ 5.5 = 1.86 Y’ = (-0.91) 5 + 5.5 = .95 This is like our average (or standard) size of our residual 2.739 0.95 “Standard Error of the Estimate” 3
Sound familiar?? What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate (line) Standard error of the estimate: • a measure of the average amount of predictive error • the average amount that Y’ scores differ from Y scores • a mean of the lengths of the green lines
Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values • These Y values are normally distributed. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal.
Thank you! See you next time!!