1 / 19

Statistical Sampling

Statistical Sampling. Population vs. Sample. Population The collection of units (be they people, plants, cities , etc.) to which we want to generalize a set of findings or a statistical model Sample

Download Presentation

Statistical Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Sampling

  2. Population vs. Sample • Population • The collection of units (be they people, plants, cities , etc.) to which we want to generalize a set of findings or a statistical model • Sample • A smaller (but hopefully representative) collection of units from a population used to determine truths about that population

  3. Population vs. Sample

  4. Sample vs. Population

  5. Standard Error Central Limit Theorem: If we sample enough times, we can know the population mean without having to sample the entire population Variation Across Sample Means standard deviation Standard = Error square root of the sample size

  6. Population X1, X2, …, XN Sample x1, x2, …, xn m Population Mean Sample Mean Population and Sample Mean

  7. Sample Size and Error

  8. Stratified Sampling Size of Company Location

  9. n = 30

  10. Normal Distribution

  11. Z-Score

  12. Z-Score and Standard Deviation Z=(76-70)/12=0.5 Z=(76-70)/3=2 m Two distributions of exam scores. For both distributions, = 70, but for one distribution, = 3, and for the other, = 12. The position of X = 76 is very different for these two distributions. σ σ

  13. Z and t distribution

  14. Normal Distribution

  15. Outliers Offense Defense > off.mean [1] 23.41875 > off.sd [1] 4.361373 > off.mean + 3*off.sd [1] 36.50287 > max(nfl$OffPtsA) [1] 37.9

  16. Indexing • nfl[2,3] second team, third stat • nfl[2,] set of all stats for the second team • nfl[c(1,2,5),] first, second and fifth teams • nfl[10:13,] tenth through thirteenth teams • nfl[-2,] stats for all teams except the second #remove a datapoint / row nfl2 <- nfl nfl2 <- nfl2[-10,]

  17. Box-and-Whisker Plot A box and whisker plot (sometimes called a boxplot) is a graph that presents information from a five-number summary. It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set.

  18. Box Plot Q3 + 1.5×IQR Q3 75th IQR = Q3 – Q1. Q125th IQR = Q3 – Q1.

  19. Box Plot

More Related