1.02k likes | 1.27k Views
Top Ten #1. Descriptive Statistics NOTE! This Power Point file is not an introduction, but rather a checklist of topics to review. Location: central tendency. Population Mean = µ = Σ x/N = (5+1+6)/3 = 12/3 = 4 Algebra: Σx = N* µ = 3*4 =12 Do NOT use if N is small and extreme values
E N D
Top Ten #1 Descriptive Statistics NOTE! This Power Point file is not an introduction, but rather a checklist of topics to review
Location: central tendency • Population Mean =µ= Σx/N = (5+1+6)/3 = 12/3 = 4 • Algebra: Σx = N*µ = 3*4 =12 • Do NOT use if N is small and extreme values • Ex: Do NOT use if 3 houses sold this week, and one was a mansion
Location • Median = middle value • Ex: 5,1,6 • Step 1: Sort data: 1,5,6 • Step 2: Middle value = 5 • OK even if extreme values • Home sales: 100K,200K,900K, so mean =400K, but median = 200K
Location • Mode: most frequent value • Ex: female, male, female • Mode = female • Ex: 1,1,2,3,5,8: mode = 1
Relationship • Case 1: if symmetric (ex bell, normal), then mean = median = mode • Case 2: if positively skewed to right, then mode<median<mean • Case 3: if negatively skewed to left, then mean<median<mode
Dispersion • How much spread of data • How much uncertainty • Range = Max-Min > 0 • But range affected by unusual values • Ex: Santa Monica = 105 degrees once a century, but range would be 105-min
Standard Deviation • Better than range because all data used • Population SD = Square root of variance =sigma =σ • SD > 0
Empirical Rule • Applies to mound or bell-shaped curves • Ex: normal distribution • 68% of data within + one SD of mean • 95% of data within + two SD of mean • 99.7% of data within + three SD of mean
Standard Deviation Total variation = 34 • Sample variance = 34/4 = 8.5 • Sample standard deviation = square root of 8.5 = 2.9
Graphical Tools • Line chart: trend over time • Scatter diagram: relationship between two variables • Bar Chart: frequency for each category • Histogram: frequency for each class of measured data (graph of frequency distr) • Box Plot: graphical display based on quartiles, which divide data into 4 parts
Top Ten #2 • Hypothesis Testing
Ho: Null Hypothesis • Population mean=µ • Population proportion=π • Never include sample statistic in hypothesis
HA: Alternative Hypothesis • ONE TAIL ALTERNATIVE • Right tail: µ>number(smog ck) π>fraction(%defectives) Left tail: µ<number(weight in box of crackers) π<fraction(unpopular President’s % approval low)
Two-tail Alternative • Population mean not equal to number (too hot or too cold) • Population proportion not equal to fraction(% alcohol too weak or too strong)
Reject null hypothesis if • Absolute value of test statistic > critical value • Reject Ho if |Z Value| > critical Z • Reject Ho if | t Value| > critical t • Reject Ho if p-value < significance level (note that direction of inequality is reversed) • Reject Ho if very large difference between sample statistic and population parameter in Ho
Example: Smog Check • Ho: µ = 80 • HA: µ > 80 • If test statistic =2.2 and critical value = 1.96, reject Ho, and conclude that the population mean is likely > 80 • If test statistic = 1.6 and critical value = 1.96, do not reject Ho, and reserve judgment about Ho
Type I vs Type II error • Alpha=α = P(type I error) = Significance level = probability you reject true null hypothesis • Ex: Ho: Defendant innocent • α = P(jury convicts innocent person) • Beta= β = P(type II error) = probability you do not reject a null hypothesis, given Ho false • β =P(jury acquits guilty person)
Top Ten #3 • Confidence Intervals: Mean and Proportion
Confidence Interval: Mean • Use normal distribution (Z table if): population standard deviation (sigma) known and either (1) or (2): • Normal population (2) Sample size > 30
Confidence Interval: Mean • If normal table, then µ =(Σx/n)+ Z(σ/n1/2), where n1/2 is the square root of n
Normal table • Tail = .5(1 – confidence level) • NOTE! Different statistics texts have different normal tables • This review uses the tail of the bell curve • Ex: 95% confidence: tail = .5(1-.95)= .025 • Z.025 = 1.96
Example • n=49, Σx=490, σ=2, 95% confidence • µ = (490/49) + 1.96(2/7) = 10 + .56 • 9.44 < µ < 10.56
Conf. Interval: Mean t distribution • Use if normal population but population standard deviation (σ) not known • If you are given the sample standard deviation (s), use t table, assuming normal population • If one population, n-1 degrees of freedom
t distribution • µ = (Σx/n) + tn-1(s/n1/2)
Conf. Interval: Proportion • Use if success or failure (ex: defective or ok) Normal approximation to binomial ok if (n)(π) > 5 and (n)(1-π) > 5, where n = sample size π= population proportion NOTE! NEVER use the t table if proportion!!
Confidence Interval: proportion • Π= p + Z(p(1-p)/n)1/2 • Ex: 8 defectives out of 100, so p = .08 and n = 100, 95% confidence Π= .08 + 1.96(.08*.92/100)1/2 = .08 + .05
Interpretation • If 95% confidence, then 95% of all confidence intervals will include the true population parameter • NOTE! Never use the term “probability” when estimating a parameter!! (ex: Do NOT say ”Probability that population mean is between 23 and 32 is .95” because parameter is not a random variable)
Point vs Interval Estimate • Point estimate: statistic (single number) • Ex: sample mean, sample proportion • Each sample gives different point estimate • Interval estimate: range of values • Ex: Population mean = sample mean + error • Parameter = statistic + error
Width of Interval • Ex: sample mean =23, error = 3 • Point estimate = 23 • Interval estimate = 23 + 3, or (20,26) • Width of interval = 26-20 = 6 • Wide interval: Point estimate unreliable
Wide interval if • (1) small sample size(n) • (2) large standard deviation(σ) • (3) high confidence interval (ex: 99% confidence interval wider than 95% confidence interval) If you want narrow interval, you need a large sample size or small standard deviation or low confidence level.
Top Ten #4: Linear Regression • Regression equation: y=bo+b1x • y=dependent variable=predicted value • x= independent variable • bo=y-intercept =predicted value of y if x=0 • b1=slope=regression coefficient =change in y per unit change in x
Slope vs correlation • Positive slope (b1>0): positive correlation between x and y (y incr if x incr) • Negative slope (b1<0): negative correlation (y decr if x incr) • Zero slope (b1=0): no correlation(predicted value for y is mean of y), no linear relationship between x and y
Simple linear regression • Simple: one independent variable, one dependent variable • Linear: graph of regression equation is straight line
Coefficient of determination • R2 = % of total variation in y that can be explained by variation in x • Measure of how close the linear regression line fits the points in a scatter diagram • R2 = 1: max possible value: perfect linear relationship between y and x (straight line) • R2 = 0: min value: no linear relationship
example • Y = salary (female manager, in thousands of dollars) • X = number of children • n = number of observations
Slope = -6.500 • Method of Least Squares formulas not on 301 exam • B1 = -6.500 given
Interpret slope If one female manager has 1 more child than another, salary is $6500 lower
Intercept bo= y – b1x
Intercept bo=44.33-(-6.5)(2.33) = 59.5
Interpret intercept If number of children is zero, expected salary is $59,500