Chapter 3 Selected Basic Concepts in Statistics

Chapter 3Selected Basic Concepts in Statistics • Expected Value, Variance, Standard Deviation • Numerical summaries of selected statistics • Sampling distributions

Expected Value Weighted average Not the value of y you “expect”; a long-run average

E(y) Example 1 Toss a fair die once. Let y be the number of dots on upper face.

E(y) Example 2: GreenMountain Lottery Choose 3 digits between 0 and 9. Repeats allowed, order of digits counts. If your 3-digit number is selected, you win $500. Let y be your winnings (assume ticket cost $0)

American Roulette 0 - 00(The European version has only one 0.) US Roulette Wheel and Table • The roulette wheel has alternating black and red slots numbered 1 through 36. • There are also 2 green slots numbered 0 and 00. • A bet on any one of the 38 numbers (1-36, 0, or 00) pays odds of 35:1; that is . . . • If you bet $1 on the winning number, you receive $36, so your winnings are $35

US Roulette Wheel: Expected Value of a $1 bet on a single number • Let y be your winnings resulting from a $1 bet on a single number; y has 2 possible values y -1 35 p(y) 37/38 1/38 • E(y)= -1(37/38)+35(1/38)= -.05 • So on average the house wins 5 cents on every such bet. A “fair” game would have E(y)=0. • The roulette wheels are spinning 24/7, winning big $$ for the house, resulting in …

Variance and Standard Deviation • Measure spread around the middle, where the middle is measured by 

Variance Example Toss a fair die once. Let y be the number of dots on upper face. Recall  = 3.5

V(y) Example 2: GreenMountain Lottery Recall  = .50

Estimators for , 2,  • s2 “average” squared deviation from the middle • Automate these calculations • Examples

Linear Transformations of Random Variables and Sample Statistics • Random variable y with E(y) and V(y) • Lin trans y*=a+by, what is E(y*) and V(y*) in terms of original E(y) and V(y)? • Data y1, y2, …, yn with mean y and standard deviation s • Lin trans y* = a + by; new data y1*, y2*, …, yn*; what is y* and s* in terms of y and s

Linear Transformations Rules for E(y*), V(y*) and SD(y*) Rules for y*, s*2 , and s* • E(y*)=E(a+by) = a + bE(y) • V(y*)=V(a+by) = b2V(y) • SD(y*)=SD(a+by) =|b|SD(y) • y* = a + by • s*2 = b2s2 • s* = bs

Expected Value and SD of Linear Transformation a + by Let y=number of repairs a new computer needs each year. Suppose E(y)= 0.20 and SD(y)=0.55 The service contract for the computer offers unlimited repairs for $100 per year plus a $25 service charge for each repair. What are the mean and standard deviation of the yearly cost of the service contract? Cost = $100 + $25y E(cost) = E($100+$25y)=$100+$25E(y)=$100+$25*0.20= = $100+$5=$105 SD(cost)=SD($100+$25y)=SD($25y)=$25*SD(y)=$25*0.55= =$13.75

Addition and Subtraction Rules for Random Variables • E(X+Y) = E(X) + E(Y); • E(X-Y) = E(X) - E(Y) • When X and Y are independent random variables: • Var(X+Y)=Var(X)+Var(Y) • SD(X+Y)= SD’s do not add: SD(X+Y)≠ SD(X)+SD(Y) • Var(X−Y)=Var(X)+Var(Y) • SD(X −Y)= SD’s do not subtract: SD(X−Y)≠ SD(X)−SD(Y) SD(X−Y)≠ SD(X)+SD(Y)

Example: rv’s NOT independent • X=number of hours a randomly selected student from our class slept between noon yesterday and noon today. • Y=number of hours the same randomly selected student from our class was awake between noon yesterday and noon today. Y = 24 – X. • What are the expected value and variance of the total hours that a student is asleep and awake between noon yesterday and noon today? • Total hours that a student is asleep and awake between noon yesterday and noon today = X+Y • E(X+Y) = E(X+24-X) = E(24) = 24 • Var(X+Y) = Var(X+24-X) = Var(24) = 0. • We don't add Var(X) and Var(Y) since X and Y are not independent.

Pythagorean Theorem of Statistics for Independent X and Y a2 + b2 = c2 Var(X)+Var(Y)=Var(X+Y) c2=a2+b2 Var(X) Var(X+Y) c a2 a SD(X+Y) SD(X) a + b ≠ c SD(X)+SD(Y) ≠SD(X+Y) b SD(Y) b2 Var(Y)

Pythagorean Theorem of Statistics for Independent X and Y 32 + 42 = 52 Var(X)+Var(Y)=Var(X+Y) 25=9+16 Var(X) Var(X+Y) 5 9 3 SD(X+Y) SD(X) 3 + 4 ≠ 5 SD(X)+SD(Y) ≠SD(X+Y) 4 SD(Y) 16 Var(Y)

Example: meal plans • Regular plan: X = daily amount spent • E(X) = $13.50, SD(X) = $7 • Expected value and stan. dev. of total spent in 2 consecutive days? (assume independent) • E(X1+X2)=E(X1)+E(X2)=$13.50+$13.50=$27 SD(X1 + X2) ≠ SD(X1)+SD(X2) = $7+$7=$14

Example: meal plans (cont.) • Jumbo plan for football players Y=daily amount spent • E(Y) = $24.75, SD(Y) = $9.50 • Amount by which football player’s spending exceeds regular student spending is Y-X • E(Y-X)=E(Y)–E(X)=$24.75-$13.50=$11.25 SD(Y ̶ X) ≠ SD(Y) ̶ SD(X) = $9.50 ̶ $7=$2.50

For random variables, X+X≠2X • Let X be the annual payout on a life insurance policy. From mortality tables E(X)=$200 and SD(X)=$3,867. • If the payout amounts are doubled, what are the new expected value and standard deviation? • Double payout is 2X. E(2X)=2E(X)=2*$200=$400 • SD(2X)=2SD(X)=2*$3,867=$7,734 • Suppose insurance policies are sold to 2 people. The annual payouts are X1 and X2. Assume the 2 people behave independently. What are the expected value and standard deviation of the total payout? • E(X1 + X2)=E(X1) + E(X2) = $200 + $200 = $400 The risk to the insurance co. when doubling the payout (2X) is not the same as the risk when selling policies to 2 people.

Estimator of population mean  • y will vary from sample to sample • What are the characteristics of this sample-to-sample behavior?

Numerical Summary of Sampling Distribution of y • Unbiased: a statistic is unbiased if it has expected value equal to the population parameter.

Numerical Summary of Sampling Distribution of y

Standard Error • Standard error - square root of the estimated variance of a statistic • important building block for statistical inference

Shape? • We have numerical summaries of the sampling distribution of y • What about the shape of the sampling distribution of y ?

THE CENTRAL LIMIT THEOREM The World is Normal Theorem

The Central Limit Theorem(for the sample mean y) • If a random sample of n observations is selected from a population (any population), then when n is sufficiently large, the sampling distribution of y will be approximately normal. (The larger the sample size, the better will be the normal approximation to the sampling distribution of y.)

The Importance of the Central Limit Theorem • When we select simple random samples of size n, the sample means we find will vary from sample to sample. We can model the distribution of these sample means with a probability model that is Shape of population is irrelevant

Estimating the population total 

Estimating the population total  • Expected value

Estimating the population total  • Variance, standard deviation, standard error

Finite population case • Example: sampling w/ replacement to estimate 

Finite population case • Example: sampling w/ replacement to estimate  • From the table:

Finite population case • Example: sampling w/ replacement to estimate 

Finite population case • Example: sampling w/ replacement to estimate  • Example Summary

Finite population case • Sampling w/ replacement to estimate pop. total  • In general

Finite population case • Sampling w/ replacement to estimate pop. total 

Finite population case In reality, do not know value of yifor every item in the population. BUT can choose i proportional to a known measurement highly correlated with yi . • Sampling w/ replacement to estimate pop. total 

Finite population case • Sampling w/ replacement to estimate pop. total 

Finite population case • Sampling without replacement to estimate pop. total  Thus far have assumed a population that does not change when the first item is selected. • Example: population {1, 2, 3, 4}; n=2, suppose equally likely. • Prob. of selecting 3 on first draw is ¼. • Prob. of selecting 3 on second draw depends on first draw (probability is 0 or 1/3)

Finite population case • Sampling without replacement to estimate pop. total  Worksheet

End of Chapter 3

Chapter 3 Selected Basic Concepts in Statistics