populations vs. samples

populations vs. samples • we want to describe both samples and populations • the latter is a matter of inference…

“outliers” • minority cases, so different from the majority that they merit separate consideration • are they errors? • are they indicative of a different pattern? • think about possible outliers with care, but beware of mechanical treatments… • significance of outliers depends on your research interests

summaries of distributions • graphic vs. numeric • graphic may be better for visualization • numeric are better for statistical/inferential purposes • resistance to outliers is usually an advantage in either case

general characteristics [“peakedness”] • kurtosis ‘leptokurtic’ ’platykurtic’

right(positive) skew left(negative) skew • skew (skewness)

central tendency • measures of central tendency • provide a sense of the value expressed by multiple cases, over all… • mean • median • mode

mean • center of gravity • evenly partitions the sum of all measurement among all cases; average of all measures

mean – pro and con • crucial for inferential statistics • mean is not very resistant to outliers • a “trimmed mean” may be better for descriptive purposes

mean R: mean(x)

trimmed mean R: mean(x, trim=.1)

median • 50th percentile… • less useful for inferential purposes • more resistant to effects of outliers…

median

mode • the most numerous category • for ratio data, often implies that data have been grouped in some way • can be more or less created by the grouping procedure • for theoretical distributions—simply the location of the peak on the frequency distribution

1.0 1.5 2.0 2.5 modal class = ‘hamlets’ isolated scatters hamlets villages regional centers regional centers

dispersion • measures of dispersion • summarize degree of clustering of cases, esp. with respect to central tendency… • range • variance • standard deviation

would be better to use midspread… range R: range(x)

R: var(x) variance • analogous to average deviation of cases from mean • in fact, based on sum of squared deviations from the mean—“sum-of-squares”

variance • computational form:

note: units of variance are squared… • this makes variance hard to interpret • ex.: projectile point sample: mean = 22.6 mm variance = 38 mm2 • what does this mean???

standard deviation • square root of variance:

standard deviation • units are in same units as base measurements • ex.: projectile point sample: mean = 22.6 mm standard deviation = 6.2 mm • mean +/- sd (16.4—28.8 mm) • should give at least some intuitive sense of where most of the cases lie, barring major effects of outliers

populations vs. samples