230 likes | 405 Views
populations vs. samples. we want to describe both samples and populations the latter is a matter of inference…. “outliers”. minority cases, so different from the majority that they merit separate consideration are they errors? are they indicative of a different pattern?
E N D
populations vs. samples • we want to describe both samples and populations • the latter is a matter of inference…
“outliers” • minority cases, so different from the majority that they merit separate consideration • are they errors? • are they indicative of a different pattern? • think about possible outliers with care, but beware of mechanical treatments… • significance of outliers depends on your research interests
summaries of distributions • graphic vs. numeric • graphic may be better for visualization • numeric are better for statistical/inferential purposes • resistance to outliers is usually an advantage in either case
general characteristics [“peakedness”] • kurtosis ‘leptokurtic’ ’platykurtic’
right(positive) skew left(negative) skew • skew (skewness)
central tendency • measures of central tendency • provide a sense of the value expressed by multiple cases, over all… • mean • median • mode
mean • center of gravity • evenly partitions the sum of all measurement among all cases; average of all measures
mean – pro and con • crucial for inferential statistics • mean is not very resistant to outliers • a “trimmed mean” may be better for descriptive purposes
mean R: mean(x)
trimmed mean R: mean(x, trim=.1)
median • 50th percentile… • less useful for inferential purposes • more resistant to effects of outliers…
mode • the most numerous category • for ratio data, often implies that data have been grouped in some way • can be more or less created by the grouping procedure • for theoretical distributions—simply the location of the peak on the frequency distribution
1.0 1.5 2.0 2.5 modal class = ‘hamlets’ isolated scatters hamlets villages regional centers regional centers
dispersion • measures of dispersion • summarize degree of clustering of cases, esp. with respect to central tendency… • range • variance • standard deviation
would be better to use midspread… range R: range(x)
R: var(x) variance • analogous to average deviation of cases from mean • in fact, based on sum of squared deviations from the mean—“sum-of-squares”
variance • computational form:
note: units of variance are squared… • this makes variance hard to interpret • ex.: projectile point sample: mean = 22.6 mm variance = 38 mm2 • what does this mean???
standard deviation • square root of variance:
standard deviation • units are in same units as base measurements • ex.: projectile point sample: mean = 22.6 mm standard deviation = 6.2 mm • mean +/- sd (16.4—28.8 mm) • should give at least some intuitive sense of where most of the cases lie, barring major effects of outliers