370 likes | 618 Views
Statistics Workshop Univariate Descriptive Statistics J-Term 2009 Bert Kritzer. Description vs. Inference. Summarizing a set of information Central tendency & Dispersion Shape of Distribution Relationships Form/nature of relationship Strength of relationship (“correlation”)
E N D
Statistics WorkshopUnivariate Descriptive StatisticsJ-Term 2009Bert Kritzer
Description vs. Inference • Summarizing a set of information • Central tendency & Dispersion • Shape of Distribution • Relationships • Form/nature of relationship • Strength of relationship (“correlation”) • Inference from a sample to a “population” • Statistics as estimates of population “parameters” • Inference about processes: random vs. systematic • Inference using population data • Inference as separating what’s observed into “systematic” and “random” components Observation = Systematic + Random • Random can reflect sampling and/or process
Types of Variables • “Nominal” or “Categorical” (qualitative) • Unordered categories • Dichotomies (e.g., gender, win/lose) • Polytomies (e.g., religion, race) • “Interval” (quantitative) • Continuous and discrete • “Ordinal” • Ordered but values indicate only ordering • Grouped (5 point scales) and ungrouped (class rank, seniority)
Describing Categorical Variables: Percentages & Simple Graphics
Central TendencyOther Measures • Median or “middle” case • Median is a “positional measure” • Median is the 50th Percentile • Mean vs. Median • “Skewed” Data • Jury verdicts example • Mode: Most commonly occurring value • “Modal category”
Dispersion • Positional measure: Interquartile Range • Difference between the 25th and 75th percentiles (1st and 3rd quartiles) • Midspread: range from 25th to 75th percentiles (contains 50% of the cases) • Variance and Standard Deviation • Variance: Mean squared deviation: SSD/n • Standard Deviation: square root of variance • nvsn-1 as denominator • Percent of cases within one standard deviation depends on the distribution
Five Number Summary & Boxplot(for age in a set of data) • Minimum (17) • First quartile (33) (25th Percentile) • Median (45) • Third quartile (57) (75th Percentile) • Maximum (91)
Distributions • Histogram Displays the Distribution • Theoretical Distributions: Derived from probability theory • Uniform • 58% within one standard deviation; 100% within 1.73 • Binomial (e.g., series of coin flips) • Concentration depends on probability parameter • Normal (bell-shaped) • 68% within one standard deviation; 95% within two • Empirical Distributions: What is observed in practice • Chebyshev's Theorem: Regardless of the distribution, no more than 25% of the observations can be more than 2 standard deviations from the mean
Distributions(continued) • Symmetrical distributions • Mean = Median • Assymetrical distributions • “Skew”
2008 Salaries of Major League Baseball Players (in $1,000’s) Mean: $3,112,101 Median: $1,000,000
Time Plots: Showing Change Over TimeFederal Civil Filings, 1975-2007
Time Plots: Showing Change Over TimeWomen Law Graduates & Women SC Clerks
Base Year IssuePercent of Incumbents Facing Competition in State Supreme Court Elections
Centigrade to Fahrenheit: Change of Scale(Linear Transformation) General Linear Transformation: Transforming from inches to centimeters, a = 0 and b = 2.54. $’s to Euro’s? A mean of 5º in centigrade would be a mean of 41º in Fahrenheit. A 10º standard deviation in centigrade would be an 18º standard deviation in Fahrenheit.
Standard Scores (Z-scores) Mean of 0 and standard deviation of 1