1 / 51

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_F06. Week 9-25-06 and some preparation for exam 2. suggested exercises solutions given in text 3-33, 3-41, 3-42 (except b, c, h, m, n), 3-43, 3-49, 3-57 (except c, d), 3-59, 3-61, 3-63, 3-65.

luigi
Download Presentation

Raoul LePage Professor STATISTICS AND PROBABILITY stt.msu/~lepage click on STT315_F06

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raoul LePage Professor STATISTICS AND PROBABILITY www.stt.msu.edu/~lepage click on STT315_F06 Week 9-25-06 and some preparation for exam 2.

  2. suggested exercises solutions given in text 3-33, 3-41, 3-42 (except b, c, h, m, n), 3-43, 3-49, 3-57 (except c, d), 3-59, 3-61, 3-63, 3-65. textbook exercises are not comprehensive Week 9-25-06 and some preparation for exam 2.

  3. NORMAL DISTRIBUTIONBERNOULLI TRIALSBINOMIAL DISTRIBUTIONPOISSON DISTRIBUTION PROBABILITY MODELS HAVING BROAD APPLICATION

  4. NORMAL DISTRIBUTION: WHERE ARE THE MEAN AND STANDARD DEVIATION IN THIS PICTURE? note the point of inflexion note the balance point

  5. IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 point of inflexion SD=15 MEAN = 100

  6. DISTRIBUTION OF THE NUMBER OF HEADS IN 100 COIN TOSSES: APPROXIMATELY NORMAL, MEAN 50, STD DEVIATION 5 5 50

  7. DISTRIBUTION OF THE NUMBER OF ACCIDENTS IN ONE MONTH IF WE AVERAGE 39.7 PER MONTH: APPROXIMATELY NORMAL, MEAN 39.7, STD DEVIATION 6.3 6.3 39.7

  8. NORMAL DISTRIBUTIONS ARE ALIKE IN SD UNITS FROM THE MEAN ~ 68% WITHIN 1 SD OF MEAN ~ 95% WITHIN 2 SD OF MEAN Illustrated for the Standard Normal Mean=0, SD=1 ~68%

  9. NORMAL DISTRIBUTIONS ARE ALIKE IN SD UNITS FROM THE MEAN ~ 68% WITHIN 1 SD OF MEAN ~ 95% WITHIN 2 SD OF MEAN Illustrated for the Standard normal Mean=0, SD=1 ~95%

  10. IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 15 ~68/2 =34% ~95/2=47.5% 130 85 100

  11. IQ DISTRIBUTION: ~NORMAL, MEAN 100 STANDARD DEVIATION 15 15 ~68/2 =34% ~95/2=47.5% 130 85 100

  12. STANDARD SCORES CONVERT TO 0 MEAN; SD 1 IQ Z 1 15 0 Standard Normal 100

  13. STANDARD SCORES CONVERT TO 0 MEAN; SD 1

  14. Z - TABLE CUT AND PASTE P(Z > 0) = P(Z < 0 ) = 0.5 P(Z > 2.66) = 0.5 - P(0 < Z < 2.66) = 0.5 - 0.4961 = 0.0039 P(Z < 1.92) = 0.5 + P(0 < Z < 1.92) = 0.5 + 0.4726 = 0.9726

  15. BERNOULLI DISTRIBUTION x p(x) p (1 denotes “success”) 0 q (0 denotes “failure”) __ 1 0 < p < 1 q = 1 - p

  16. Notation: BERNOULLI RANDOM VARIABLE X P(success) = P(X = 1) = p P(failure) = P(X = 0) = q e.g. X = “sample voter is Democrat” Population has 48% Dem. p = 0.48, q = 0.52 P(X = 1) = 0.48

  17. INDEPENDENT BERNOULLI-p "S" denotes success "F" denotes failure P(S1 S2 F3 F4 F5 F6 S7) = p3 q4 just write P(SSFFFFS) = p3 q4 “the answer only depends upon how many of each, not their order.” e.g. 48% Dem, 5 sampled, with-repl: P(Dem Rep Dem Dem Rep) = 0.483 0.522

  18. BINOMIAL DISTRIBUTION FOR THE TOTAL NUMBER OF SUCCESSES IN INDEPENDENT p-BERNOULLI TRIALS. e.g. P(exactly 2 Dems out of sample of 4) = P(DDRR) + P(DRDR) + P(DDRR) + P(RDDR) + P(RDRD) + P(RRDD) = 6 .482 0.522 ~ 0.374. There are 6 ways to arrange 2D 2R.

  19. BINOMIAL DISTRIBUTION FOR THE TOTAL NUMBER OF SUCCESSES IN INDEPENDENT p-BERNOULLI TRIALS. e.g. P(exactly 3 Dems out of sample of 5) = P(DDDRR) + P(DDRDR) + P(DDRRD) + P(DRDDR) + P(DRDRD) + P(DRRDD) + P(RDDDR) +P(RDDRD) + P(RDRDD) + P(RRDDD) = 10 .483 0.522 ~ 0.299. There are 10 ways to arrange 3D 2R. Same as the number of ways to select 3 from 5.

  20. COUNTING ARRANGEMENTS 5! ways to arrange 5 things in a line Do it thus (1:1 with arrangements): select 3 of the 5 to go first in line, arrange those 3 at the head of line then arrange the remaining 2 after. 5! = (ways to select 3 from 5) 3! 2! So num ways must be 5! /( 3! 2!) = 10.

  21. BINOMIAL FORMULA Let random variable X denote the number of “S” in n independent Bernoulli p-Trials. By definition, X has a Binomial Distribution and for each of x = 0, 1, 2, …, n P(X = x) = (n!/(x! (n-x)!) ) px qn-x e.g. P(44 Dems in sample of 100 voters) = (100!/(44! 56!)) 0.4844 0.52100-44 = 0.05812.

  22. Caveats: Binomial n!/(x! (n-x)!) is the count of how many arrangements there are of a string of x letters “S” and n-x letters “F.” . px qn-x is the shared probability of each string of x letters “S” and n-x letters “F.” (define 0! = 1, p0 = q0 = 1 and the formula goes through for every one of x = 0 through n) is short for the arrangement count = Binomial Coefficient

  23. Normal Approx of Binomial Poisson and its normal Approx Aspects of random sampling Week 9-25-06

  24. Normal Approx of Binomial n = 10, p = 0.4 mean = n p = 4 sd = root(n p q) ~ 1.55 Week 9-25-06

  25. Normal Approx of Binomial n = 30, p = 0.4 mean = n p = 12 sd = root(n p q) ~ 2.683 Week 9-25-06

  26. Normal Approx of Binomial n = 100, p = 0.4 mean = n p = 40 sd = root(n p q) ~ 4.89898 Week 9-25-06

  27. Poisson Distribution Governing Counts of Rare Events p(x) = e-mean meanx / x! for x = 0, 1, 2, ..ad infinitum Week 9-25-06

  28. Poisson e..g. X = number of times ace of spades turns up in 104 tries X~ Poisson with mean 2 p(x) = e-mean meanx / x! e.g. p(3) = e-2 23 / 3! ~ 0.18 Week 9-25-06

  29. Poisson e.g. X = number of raisins in MY cookie. Batter has 400 raisins and makes 144 cookies. E X = 400/144 ~ 2.78 per cookie p(x) = e-mean meanx / x! e.g. p(2) = e-2.78 2.782 / 2! ~ 0.24 (around 24% of cookies have 2 raisins) Week 9-25-06

  30. Poisson THE FIRST BEST THING ABOUT THE POISSON IS THAT THE MEAN ALONE TELLS US THE ENTIRE DISTRIBUTION! note: Poisson sd = root(mean) Week 9-25-06

  31. 400 raisins 144 COOKIES E X = 400/144 ~ 2.78raisins per cookie sd = root(mean) = 1.67 (for Poisson) Week 9-25-06

  32. Poisson THE SECOND BEST THING ABOUT THE POISSON IS THAT FOR A MEAN AS SMALL AS 3 THE NORMAL APPROXIMATION WORKS WELL. 1.67 = sd = root(mean) Special to Poisson Week 9-25-06 mean 2.78

  33. WE AVERAGE 127.8 ACCIDENTS PER MO. E X = 127.8 accidents If Poisson then sd = root(127.8) = 11.3049 and the approx dist is: sd = root(mean) = 11.3 Special to Poisson ~ Week 9-25-06 mean 127.8 accidents

  34. Aspects of Random Sampling Week 9-25-06

  35. The overwhelming majority of samples of n from a population of N can stand-in for the population. THE GREAT TRICK OF STATISTICS ATT Sysco Pepsico GM Dow population of N = 5 sample of n = 2

  36. The overwhelming majority of samples of n from a population of N can stand-in for the population. THE GREAT TRICK OF STATISTICS ATT Sysco Pepsico GM Dow ATT Pepsico population of N = 5 sample of n = 2

  37. Sample size n must be “large.” For only a few characteristics at atime, such as profit, sales, dividend.SPECTACULAR FAILURES MAY OCCUR! GREAT TRICK : SOME CAVEATS ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 population of N = 5 sample of n = 2

  38. With-replacement HOW ARE WE SAMPLING ? ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 Pepsi 42 Pepsi 42 population of N = 5 sample of n = 2

  39. With-replacementvs without replacement. HOW ARE WE SAMPLING ? ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 population of N = 5 sample of n = 2

  40. GREAT TRICK : SOME CAVEATS This sample is obviously “not representative.” ATT 12 Sysco 21 Pepsi 42 GM 8 Dow 9 Sysco 21 Pepsi 42 population of N = 5 sample of n = 2

  41. Rule of thumb: With and without replacement are about the same ifroot [(N-n) /(N-1)] ~ 1. DOES IT MAKE A DIFFERENCE ? with vs without SAME ? population of N sample of n

  42. They would have you believe the population is {8, 9, 12, 42} and the sample is {42}. A SET is a collection of distinct entities. CORRECTION TO PAGE 25 OF TEXT ATT 12 IBM 42 AAA 9 Pepsi 42 GM 8 Dow 9 WE SAMPLE COMPANIES NUMBERS COME WITH THEM Pepsi 42 Pepsi 42

  43. IF THE OVERWHELMING MAJORITY OF SAMPLES ARE “GOOD SAMPLES” THEN WE CAN OBTAIN A “GOOD” SAMPLE BY RANDOM SELECTION. THE ROLE OF RANDOM SAMPLING

  44. HOW TO SAMPLE RANDOMLY ? SELECTING A LETTER AT RANDOM Digits are made to correspond to letters. a = 00-02 b = 03-05 …. z = 75-77 Random digits then give random letters. 1559 9068 … (Table 14, pg. 809) 15 59 90 68 etc… (split into pairs) f t * w etc… (take chosen letters) For samples without replacement just pass over any duplicates.

  45. The Great Trick is far more powerful than we have seen.A typical sample closely estimates such things as a population mean or the shape of a population density.But it goes beyond this to reveal how much variation there is among sample means and sample densities.A typical sample not only estimates population quantities. It estimates the sample-to-sample variations of its own estimates.

  46. EXAMPLE : ESTIMATING A MEAN The average account balance is $421.34 for a random with-replacement sample of 50 accounts. We estimate from this sample that the average balance is $421.34 for all accounts. From this sample we also estimate and display a “margin of error” $421.34 +/- $65.22 = . s denotes "sample standard deviation"

  47. SAMPLE STANDARD DEVIATION NOTE: Sample standard deviation s may be calculated in several equivalent ways, some sensitive to rounding errors, even for n = 2.

  48. EXAMPLE : MARGIN OF ERROR CALCULATION The following margin of error calculation for n = 4 is only an illustration. A sample of four would not be regarded as large enough. Profits per sale = {12.2, 15.3, 16.2, 12.8}. Mean = 14.125, s = 1.92765, root(4) = 2. Margin of error = +/- 1.96 (1.92765 / 2) Report: 14.125 +/- 1.8891. A precise interpretation of margin of error will be given later in the course, including the role of 1.96. The interval 14.125 +/- 1.8891 is called a “95% confidence interval for the population mean.” We used: (12.2-14.125)2 + (15.3-14.125)2 + (16.2-14.125)2 + (12.8-14.125)2 = 11.1475.

  49. EXAMPLE : ESTIMATING A PERCENTAGE A random with-replacement sample of 50 stores participated in a test marketing. In 39 of these 50 stores (i.e. 78%) the new package design outsold the old package design. We estimate from this sample that 78% of all stores will sell more of new vs old. We also estimate a “margin of error +/- 11.5% Figured: 1.96 root(pHAT qHAT)/root(n) =1.96 root(.78 .22)/root(50) = 0.114823 in Binomial setup

  50. A sample of only n = 600 from a population of N = 500 million.(FINE resolution) SAMPLING ONLY 600 FROM 500 MILLION ? sample of n = 600 sample mean = 32.84 POP mean = 32.02 FINE resolution densities very close population of N = 500,000 with a sample of n = 600

More Related