620 likes | 706 Views
Exam Feb 28: sets 1,2. Set 1 due Thurs Memo C-1 due Feb 14
E N D
Exam Feb 28: sets 1,2 • Set 1 due Thurs • Memo C-1 due Feb 14 • Free tutoring will be available next week Plan A: MW 4-6PM OR Plan B: TT 2-4PM VOTE for Plan A or Plan B Announce results Thurs
Kinderman Supplement • Ch 2: Multiple Regression • Ch 3: Analysis of Variance
MULTIPLE REGRESSION Kinderman, Ch 2
Example • Reference: Statistics for Managers • By Levine, David M; Berenson; Stephan • Second edition (1999) • Prentice Hall
Y = dependent variable = heating oil sales (gal) • X1 = Temperature (degrees) • X2 = Insulation (inches) • X1 and X2 are independent variables • Y = bo + b1X1 + b2X2 • Enter data to Excel • NOTE: If you can’t find Data Analysis, try Add-Ins
Y = 562 –5X1 –20X2 Bottom table: Coefficient Column
Interpret coefficients Intercept = bo = 562: If temp =0 and insulation = 0, heating oil sales = 562 • b1 = -5: For all homes with same insulation, each 1 degree increase in temperature should decrease heating oil sales by 5 gallons • b2 = -20: For all months with same temp, each additional 1 inch of insulation should decrease sales by 20 gallons
Categorical Variables • X = 0 or 1 • Example: 0 if male, 1 if female • Example: 1 if graduate, 0 if drop out • Example: 1 if citizen, 0 if alien • NOTE: not in this fuel oil example
Estimate sales if temp = 30, insulation = 6 • Y = 562 -5(30) – 20(6) = 292 gal
Standard Error = 26Top table • Interpret: Typical fuel oil sales were about 26 gal away from average fuel oil sales of other homes with same temp and insulation
COEFFICIENT OF MULTIPLEDETERMINATION • Top table, R square • Interpret: 96% of total variation in fuel oil sales can be explained by variation in temperature and insulation
Is there a relationship between all independent variables and dependent variables? • Ho: Null hypothesis: All coefficients = 0 Ho: NO Relationship H1: Alternative hypothesis: At least one coefficient is not zero H1: There is a relationship
Computer output: Sample data • Hypotheses: Population parameters • Ho: Parameters = 0, but sample data makes it appear that there is a relationship • Simple regression: Ho: zero slope vs H1: slope positive or slope negative
Exponents • 10-1= 0.1 • 10-2 =0.01
Decision Rule • Reject Ho if “Significance F” < alpha • Middle table • Fuel oil example: Significance F = 1.6E-09 • Excel: E = Exponent • 1.6E-09 = 1.6*10-9 =0.0000000016 • Approaches zero as limit
Significance F=p-value • Excel uses p-value only if t distribution • Significance F = probability F is greater than Sample F
Assume alpha = .05 • Since 0 < .05, reject Ho • We conclude there IS a relationship between fuel oil sales and the independent variables
Which independent variables seem to be important factors? • Ho: Temperature not important factor • H1: Temperature is important • Reject Ho if p-value < alpha • Bottom table: p-value column, X1 row • P-value = 1.6E-09, or zero • Reject Ho • Temp is important
Insulation • Ho: insulation unimportant • H1: insulation important • P-value = 1.9E-06, or zero • Reject Ho • Insulation important
Analysis of Variance (ANOVA) Kinderman, Ch 3
Hypothesis Testing • Ho: µ1 = µ2 = µ 3 • H1: Not all means are = • H1: There are differences among 3 populations • H1: Average number of accidents different depending on where you live
This course: manual calculations • If you used computer software, you could have as many populations as needed • Homework, exam: 3 populations • Computer: 4 or more populations • Ex: Ethnic classifications at CSUN
Sample Sizes • Column 1: n1 = number of drivers sampled from policyholders living in city = 3 • Column 2: n2 = sampled from suburban drivers = 3 • Col 3: n3 = sampled from rural = 3 • Number of rows of data • Kinderman example: Different sample sizes
n = n1 + n2 + n3 n =3 + 3 + 3 = 9
Hypotheses • Ho: Differences in sample means due to chance, but no differences if ALL drivers were included (Prop 103) • H1: Population means are different because city drivers have more accidents
SSB = Sum of Squares Between • Between 3 groups • Explained Variation • Here: Variation in number of accidents explained by where you live (city, suburb, rural) • If where you live did not affect accidents, we would expect SSB = 0 • Next slide: SSB formula
This example • SSB = 3(2-1.1)2+3(1-1.1)2 +3(.3-1.1)2 =4.2
MSB = Mean Square Between • MSB = SSB/2 • Note: OK for this course, but bigger problems would have bigger denominator • MSB = 4.2/2 = 2.1
SSE= Sum of Squared Error • Variation within group • Ex: Variation within group of city drivers • Unexplained variation • If every city driver had same number of accidents, we would expect SSE = 0 • Formula on next slide
(1-2)2 +(3-2)2 +(2-2)2 +(2-1)2 + (0-1)2 + (1-1)2 +(1-.3)2 + (0-.3)2 + (0-.3)2 =4.67
MSE = Mean Square Error Mean Square Within Next slide is formula for this course. Bigger problems have bigger denominator