200 likes | 356 Views
Jan. 29. “Statistics” for one quantitative variable… Mean and standard deviation (last week!) “Robust” measures of location (median and its friends) Quartiles, IQR, five-number summary, Box plots Percentiles Transforming data… Rescale: Y = c times X
E N D
Jan. 29 • “Statistics” for one quantitative variable… • Mean and standard deviation (last week!) • “Robust” measures of location (median and its friends) • Quartiles, IQR, five-number summary, Box plots • Percentiles • Transforming data… • Rescale: Y = c times X • Recenter: Y = X plus a • other transformations • adding variables to each other • Standardizing data…
Population • vs. • Sample
A statistic is • anything that can be computed from data.
STATISTICS of a single quantitative variable • MEAN • MEDIAN • QUARTILES ( Q1, Q3 ) • Five-number summary • Boxplots • Interquartile range • PERCENTILES / QUANTILES / FRACTILES • STANDARD DEVIATION • VARIANCE
Statistics of one variable… • Median --- middle value • (when values are ranked, smallest to largest) • (or, average of two middle values) • “Robust” • Trimmed mean • Midmean • Geometric mean • “RMS mean”
Mean vs. Median • Large tails affect the mean more than the median. • So: • Right-skewed distribution Mean right of median • Left-skewed distribution Mean left of median
Colleges – Datadesk histogram median — 5 mean — 5.36
salaries median — 60,000 mean — 106,875
So, which measure of “center” is best? • All the measures agree (roughly) when the distribution is symmetrical • Mean has attractive mathematical properties • Also, the mean is related to the total, if that’s what you care about • Median may be more “typical” when the distribution is non-symmetrical • A measure is “robust” if it works reasonably well under a wide variety of circumstances • Medians are robust
Computing percentiles • To calculate 20-th percentile: • Rank the values from smallest to largest • Compute 20% of n… 20% of 72 = 14.4 • Count off that many values (from lowest)… • The value at which you stop is the 20-th percentile. • What if you stop between values ?
QUARTILES • Lower quartile (Q1) = 25-th percentile • Upper quartile (Q3) = 75-th percentile • ( What’s Q2 ? ) • INTERQUARTILE RANGE ( IQR ) = Q3 minus Q1
Five-number summary • — maximum (or, say, 95 %ile) • — Q3 • — median • — Q1 • — minimum (or, say, 5 %ile)
Linear Transformations • If you MULTIPLY or DIVIDE a variable by a constant… • Y = c times X Y = X / c • then… • measures of center are multiplied or divided by c • measures of spread are multiplied or divided by |c| • If you ADD or SUBTRACT a constant from a variable… • Y = X + a Y = X – a • then… • measures of center are increased (decreased) by a • measures of spread are UNCHANGED.
More transformations • ADDING VARIABLES: • W = X + Y • Mean(W) = Mean(X) + Mean(Y) • Standard Deviation of (W) — anything can happen • OTHER TRANSFORMATIONS: • Y = X squared ? • Y = log(X) ? • …NO RELIABLE RULES for mean • or std. dev.
Standardized Variables • Write and S for mean, standard deviation of X • Then form transformed variable: • Z = (X - ) / S • Then… • mean (Z) = 0 • std dev (Z) = 1 • Z answers the question: How many standard deviations is this value above (or below) the mean?