Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_F06 Week 9-25-06 and some preparation for exam 2.

suggested exercises solutions given in text 3-33, 3-41, 3-42 (except b, c, h, m, n), 3-43, 3-49, 3-57 (except c, d), 3-59, 3-61, 3-63, 3-65. textbook exercises are not comprehensive Week 9-25-06 and some preparation for exam 2.

NORMAL DISTRIBUTIONBERNOULLI TRIALSBINOMIAL DISTRIBUTIONPOISSON DISTRIBUTION PROBABILITY MODELS HAVING BROAD APPLICATION

NORMAL DISTRIBUTION: WHERE ARE THE MEAN AND STANDARD DEVIATION IN THIS PICTURE? note the point of inflexion note the balance point

IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 point of inflexion SD=15 MEAN = 100

DISTRIBUTION OF THE NUMBER OF HEADS IN 100 COIN TOSSES: APPROXIMATELY NORMAL, MEAN 50, STD DEVIATION 5 5 50

DISTRIBUTION OF THE NUMBER OF ACCIDENTS IN ONE MONTH IF WE AVERAGE 39.7 PER MONTH: APPROXIMATELY NORMAL, MEAN 39.7, STD DEVIATION 6.3 6.3 39.7

NORMAL DISTRIBUTIONS ARE ALIKE IN SD UNITS FROM THE MEAN ~ 68% WITHIN 1 SD OF MEAN ~ 95% WITHIN 2 SD OF MEAN Illustrated for the Standard Normal Mean=0, SD=1 ~68%

NORMAL DISTRIBUTIONS ARE ALIKE IN SD UNITS FROM THE MEAN ~ 68% WITHIN 1 SD OF MEAN ~ 95% WITHIN 2 SD OF MEAN Illustrated for the Standard normal Mean=0, SD=1 ~95%

IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 15 ~68/2 =34% ~95/2=47.5% 130 85 100

STANDARD SCORES CONVERT TO 0 MEAN; SD 1 IQ Z 1 15 0 Standard Normal 100

STANDARD SCORES CONVERT TO 0 MEAN; SD 1

Z - TABLE CUT AND PASTE P(Z > 0) = P(Z < 0 ) = 0.5 P(Z > 2.66) = 0.5 - P(0 < Z < 2.66) = 0.5 - 0.4961 = 0.0039 P(Z < 1.92) = 0.5 + P(0 < Z < 1.92) = 0.5 + 0.4726 = 0.9726

BERNOULLI DISTRIBUTION x p(x) p (1 denotes “success”) 0 q (0 denotes “failure”) __ 1 0 < p < 1 q = 1 - p

Notation: BERNOULLI RANDOM VARIABLE X P(success) = P(X = 1) = p P(failure) = P(X = 0) = q e.g. X = “sample voter is Democrat” Population has 48% Dem. p = 0.48, q = 0.52 P(X = 1) = 0.48

INDEPENDENT BERNOULLI-p "S" denotes success "F" denotes failure P(S1 S2 F3 F4 F5 F6 S7) = p3 q4 just write P(SSFFFFS) = p3 q4 “the answer only depends upon how many of each, not their order.” e.g. 48% Dem, 5 sampled, with-repl: P(Dem Rep Dem Dem Rep) = 0.483 0.522

BINOMIAL DISTRIBUTION FOR THE TOTAL NUMBER OF SUCCESSES IN INDEPENDENT p-BERNOULLI TRIALS. e.g. P(exactly 2 Dems out of sample of 4) = P(DDRR) + P(DRDR) + P(DDRR) + P(RDDR) + P(RDRD) + P(RRDD) = 6 .482 0.522 ~ 0.374. There are 6 ways to arrange 2D 2R.

BINOMIAL DISTRIBUTION FOR THE TOTAL NUMBER OF SUCCESSES IN INDEPENDENT p-BERNOULLI TRIALS. e.g. P(exactly 3 Dems out of sample of 5) = P(DDDRR) + P(DDRDR) + P(DDRRD) + P(DRDDR) + P(DRDRD) + P(DRRDD) + P(RDDDR) +P(RDDRD) + P(RDRDD) + P(RRDDD) = 10 .483 0.522 ~ 0.299. There are 10 ways to arrange 3D 2R. Same as the number of ways to select 3 from 5.

COUNTING ARRANGEMENTS 5! ways to arrange 5 things in a line Do it thus (1:1 with arrangements): select 3 of the 5 to go first in line, arrange those 3 at the head of line then arrange the remaining 2 after. 5! = (ways to select 3 from 5) 3! 2! So num ways must be 5! /( 3! 2!) = 10.

BINOMIAL FORMULA Let random variable X denote the number of “S” in n independent Bernoulli p-Trials. By definition, X has a Binomial Distribution and for each of x = 0, 1, 2, …, n P(X = x) = (n!/(x! (n-x)!) ) px qn-x e.g. P(44 Dems in sample of 100 voters) = (100!/(44! 56!)) 0.4844 0.52100-44 = 0.05812.

Caveats: Binomial n!/(x! (n-x)!) is the count of how many arrangements there are of a string of x letters “S” and n-x letters “F.” . px qn-x is the shared probability of each string of x letters “S” and n-x letters “F.” (define 0! = 1, p0 = q0 = 1 and the formula goes through for every one of x = 0 through n) is short for the arrangement count = Binomial Coefficient

Normal Approx of Binomial Poisson and its normal Approx Aspects of random sampling Week 9-25-06

Normal Approx of Binomial n = 10, p = 0.4 mean = n p = 4 sd = root(n p q) ~ 1.55 Week 9-25-06

Poisson Distribution Governing Counts of Rare Events p(x) = e-mean meanx / x! for x = 0, 1, 2, ..ad infinitum Week 9-25-06

Poisson e..g. X = number of times ace of spades turns up in 104 tries X~ Poisson with mean 2 p(x) = e-mean meanx / x! e.g. p(3) = e-2 23 / 3! ~ 0.18 Week 9-25-06

Poisson e.g. X = number of raisins in MY cookie. Batter has 400 raisins and makes 144 cookies. E X = 400/144 ~ 2.78 per cookie p(x) = e-mean meanx / x! e.g. p(2) = e-2.78 2.782 / 2! ~ 0.24 (around 24% of cookies have 2 raisins) Week 9-25-06

Poisson THE FIRST BEST THING ABOUT THE POISSON IS THAT THE MEAN ALONE TELLS US THE ENTIRE DISTRIBUTION! note: Poisson sd = root(mean) Week 9-25-06

400 raisins 144 COOKIES E X = 400/144 ~ 2.78raisins per cookie sd = root(mean) = 1.67 (for Poisson) Week 9-25-06

Poisson THE SECOND BEST THING ABOUT THE POISSON IS THAT FOR A MEAN AS SMALL AS 3 THE NORMAL APPROXIMATION WORKS WELL. 1.67 = sd = root(mean) Special to Poisson Week 9-25-06 mean 2.78

WE AVERAGE 127.8 ACCIDENTS PER MO. E X = 127.8 accidents If Poisson then sd = root(127.8) = 11.3049 and the approx dist is: sd = root(mean) = 11.3 Special to Poisson ~ Week 9-25-06 mean 127.8 accidents

Aspects of Random Sampling Week 9-25-06

The overwhelming majority of samples of n from a population of N can stand-in for the population. THE GREAT TRICK OF STATISTICS ATT Sysco Pepsico GM Dow population of N = 5 sample of n = 2

The overwhelming majority of samples of n from a population of N can stand-in for the population. THE GREAT TRICK OF STATISTICS ATT Sysco Pepsico GM Dow ATT Pepsico population of N = 5 sample of n = 2

Sample size n must be “large.” For only a few characteristics at atime, such as profit, sales, dividend.SPECTACULAR FAILURES MAY OCCUR! GREAT TRICK : SOME CAVEATS ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 population of N = 5 sample of n = 2

With-replacement HOW ARE WE SAMPLING ? ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 Pepsi 42 Pepsi 42 population of N = 5 sample of n = 2

With-replacementvs without replacement. HOW ARE WE SAMPLING ? ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 population of N = 5 sample of n = 2

GREAT TRICK : SOME CAVEATS This sample is obviously “not representative.” ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 Sysco 21 Pepsi 42 population of N = 5 sample of n = 2

Rule of thumb: With and without replacement are about the same ifroot [(N-n) /(N-1)] ~ 1. DOES IT MAKE A DIFFERENCE ? with vs without SAME ? population of N sample of n

They would have you believe the population is {8, 9, 12, 42} and the sample is {42}. A SET is a collection of distinct entities. CORRECTION TO PAGE 25 OF TEXT ATT 12 IBM 42 AAA 9 Pepsi 42 GM 8 Dow 9 WE SAMPLE COMPANIES NUMBERS COME WITH THEM Pepsi 42 Pepsi 42

IF THE OVERWHELMING MAJORITY OF SAMPLES ARE “GOOD SAMPLES” THEN WE CAN OBTAIN A “GOOD” SAMPLE BY RANDOM SELECTION. THE ROLE OF RANDOM SAMPLING

HOW TO SAMPLE RANDOMLY ? SELECTING A LETTER AT RANDOM Digits are made to correspond to letters. a = 00-02 b = 03-05 …. z = 75-77 Random digits then give random letters. 1559 9068 … (Table 14, pg. 809) 15 59 90 68 etc… (split into pairs) f t * w etc… (take chosen letters) For samples without replacement just pass over any duplicates.

The Great Trick is far more powerful than we have seen.A typical sample closely estimates such things as a population mean or the shape of a population density.But it goes beyond this to reveal how much variation there is among sample means and sample densities.A typical sample not only estimates population quantities. It estimates the sample-to-sample variations of its own estimates.

EXAMPLE : ESTIMATING A MEAN The average account balance is $421.34 for a random with-replacement sample of 50 accounts. We estimate from this sample that the average balance is $421.34 for all accounts. From this sample we also estimate and display a “margin of error” $421.34 +/- $65.22 = . s denotes "sample standard deviation"

SAMPLE STANDARD DEVIATION NOTE: Sample standard deviation s may be calculated in several equivalent ways, some sensitive to rounding errors, even for n = 2.

EXAMPLE : MARGIN OF ERROR CALCULATION The following margin of error calculation for n = 4 is only an illustration. A sample of four would not be regarded as large enough. Profits per sale = {12.2, 15.3, 16.2, 12.8}. Mean = 14.125, s = 1.92765, root(4) = 2. Margin of error = +/- 1.96 (1.92765 / 2) Report: 14.125 +/- 1.8891. A precise interpretation of margin of error will be given later in the course, including the role of 1.96. The interval 14.125 +/- 1.8891 is called a “95% confidence interval for the population mean.” We used: (12.2-14.125)2 + (15.3-14.125)2 + (16.2-14.125)2 + (12.8-14.125)2 = 11.1475.

EXAMPLE : ESTIMATING A PERCENTAGE A random with-replacement sample of 50 stores participated in a test marketing. In 39 of these 50 stores (i.e. 78%) the new package design outsold the old package design. We estimate from this sample that 78% of all stores will sell more of new vs old. We also estimate a “margin of error +/- 11.5% Figured: 1.96 root(pHAT qHAT)/root(n) =1.96 root(.78 .22)/root(50) = 0.114823 in Binomial setup

A sample of only n = 600 from a population of N = 500 million.(FINE resolution) SAMPLING ONLY 600 FROM 500 MILLION ? sample of n = 600 sample mean = 32.84 POP mean = 32.02 FINE resolution densities very close population of N = 500,000 with a sample of n = 600

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

Presentation Transcript

Statistics and Data Analysis

Introduction to probability

Rules of Probability

Statistical Machine Translation

Probability Assessment

Descriptive Statistics Univariate Statistics Chi Square ANOVA

AP Statistics – Probability

Probability and Statistics with Reliability, Queuing and Computer Science Applications: Chapter 1 Introduction

Discrete Probability

Chapter 5: Probability Distributions: Discrete Probability Distributions

Lecture Slides

Lecture Slides

Probability

AP Statistics Jeopardy

Probability Densities in Data Mining

Unit 7 - Probability

Lecture Slides

AP STATISTICS EXAM REVIEW by DAVID CUSTER (click on topic of choice)

Probability and Discrete Random Variable

Brief Review