220 likes | 416 Views
Statistical Sampling. Population vs. Sample. Population The collection of units (be they people, plants, cities , etc.) to which we want to generalize a set of findings or a statistical model Sample
E N D
Population vs. Sample • Population • The collection of units (be they people, plants, cities , etc.) to which we want to generalize a set of findings or a statistical model • Sample • A smaller (but hopefully representative) collection of units from a population used to determine truths about that population
Standard Error Central Limit Theorem: If we sample enough times, we can know the population mean without having to sample the entire population Variation Across Sample Means standard deviation Standard = Error square root of the sample size
Population X1, X2, …, XN Sample x1, x2, …, xn m Population Mean Sample Mean Population and Sample Mean
Stratified Sampling Size of Company Location
Z-Score and Standard Deviation Z=(76-70)/12=0.5 Z=(76-70)/3=2 m Two distributions of exam scores. For both distributions, = 70, but for one distribution, = 3, and for the other, = 12. The position of X = 76 is very different for these two distributions. σ σ
Outliers Offense Defense > off.mean [1] 23.41875 > off.sd [1] 4.361373 > off.mean + 3*off.sd [1] 36.50287 > max(nfl$OffPtsA) [1] 37.9
Indexing • nfl[2,3] second team, third stat • nfl[2,] set of all stats for the second team • nfl[c(1,2,5),] first, second and fifth teams • nfl[10:13,] tenth through thirteenth teams • nfl[-2,] stats for all teams except the second #remove a datapoint / row nfl2 <- nfl nfl2 <- nfl2[-10,]
Box-and-Whisker Plot A box and whisker plot (sometimes called a boxplot) is a graph that presents information from a five-number summary. It does not show a distribution in as much detail as a stem and leaf plot or histogram does, but is especially useful for indicating whether a distribution is skewed and whether there are potential unusual observations (outliers) in the data set.
Box Plot Q3 + 1.5×IQR Q3 75th IQR = Q3 – Q1. Q125th IQR = Q3 – Q1.