210 likes | 337 Views
MPP Stats Bootcamp W (12:00-1:50), Week 7. Instructor: Dr. Alison Johnston ( Alison.Johnston@oregonstate.edu ) http:// oregonstate.edu/cla/polisci/alison-johnston SOC 516 Class Tutor: Daniel Hauser ( hauserd@onid.orst.edu ). Bootcamp Outline. Week 7: Descriptive Statistics
E N D
MPP Stats BootcampW (12:00-1:50), Week 7 Instructor: Dr. Alison Johnston (Alison.Johnston@oregonstate.edu) http://oregonstate.edu/cla/polisci/alison-johnston SOC 516 Class Tutor: Daniel Hauser (hauserd@onid.orst.edu)
Bootcamp Outline • Week 7: Descriptive Statistics • Means, medians, standard deviations, and standard errors • Cross tabs/Contingency tables
Samples and Statistical Inference • Recall from last week that we use (random) samples to make statistical inferences about a population • The most basic statistical inferences one can make about a population relates to its descriptive statistics • Mean • Median • Variance • Standard Deviation • Standard Error (Sample only)
Descriptive Statistics: The Explanation • The mean of a sample is the average • For a sample that is normally distributed, the mean equals the median • If a sample is skewed, due to the presence of an outlier, the mean will gravitate closer to the outlier than the median • Outliers can distort estimators’ capacity to estimate population parameters (i.e. means, beta coefficients, etc.) • The mean of a sample is an unbiased estimator if it is equal to the population mean
Descriptive Statistics: The Explanation • Sample means on their own are unhelpful as estimators if we have little idea about the spread of our data • The variance/SD/SE of a sample indicates its spread/variability around the mean (i.e. the degree of uncertainty with the outcome) • Estimators are more accurate (i.e. have a high probability that they will be close to the estimated population mean) if the variance/SD/SE is SMALL
Standard Errors vs Standard Deviations • Though standard errors are a “type” of standard deviations, they are NOT equivalent • “The standard error is the standard deviation of the sampling distribution of a statistic” • Again, statistical inference is the use of collecting sample estimators (i.e. a mean) to guess what the equivalent population parameter is • BUT sample means can vary from sample to sample. This variation in the sample mean across samples is called the sampling distribution • It is of THIS sampling distribution that the standard error represents a standard deviation of! • Measures the precision of our sample mean
The Law of Large Numbers • Statistical inference is the use of sample data to estimate population parameters • Two desired features • Estimators should be unbiased (i.e. close to parameter value) • Estimators should be accurate (i.e. low standard deviation) • How can we achieve these properties simultaneously? • Increase sample size! • LLN: Themean of the results obtained from a large number of trials (samples) should be close to the population mean (i.e. become unbiased), and will approach the population mean as they increase
STATA LAB EXERCISES • How to calculate the mean of a variable • How to calculate the median of a variable • How to calculate the variance, standard deviation and standard error of a variable • Examining how a larger sample size influences our standard error
Problems with descriptive statistics and categorical data • Descriptive statistics a frequently used on data that is of interval scale • Numerical values that mean something • If our data is nominal (i.e. categories whose values cannot be ranked), means and measures of variance will be relatively meaningless • Cross tabs to save the day!
Cross-Tabs/Contingency Tables • A cross tab is a frequency table (2x2 matrix) for two and ONLY two variables (i.e. how many observations lies within a particular category) • Dependent variable: The row variable • Independent variable: The column variable • The variables CAN NOT BE INTERVAL DATA (i.e. they MUST be categories – either nominal or ordinal data) • Cross-tabs provide a summary for how observations are distributed (i.e. their frequency) between the categories of two variables
Cross-Tabs/Contingency Tables: How they work • Say we are conducting a fitness survey across three states and want to determine whether people in some states are more likely to work out than those in others • Dependent (Row) variable: Do you work out? (Y/N) • Independent (Column) variable: In what state do you live? • How do we calculate a cross tab? • Make your 2x2 matrix first • Then count… • Thank goodness for the “tabulate” command in STATA…
Cross-Tabs and (percentage) Frequencies • We can also use cross-tabs to create the following (percentage) frequencies • Within column frequencies: What percentage of observations for a given DV category stem from each IV category • Within row frequencies: What percentage of observations for a given IV category stem from each DV category • Relative frequencies: What percentage of TOTAL observations lie within a particular cell? • (Percentage) frequencies are helpful when wanting to understand how observations are distributed • Is one cell over-representative of the sample?
STATA LAB EXERCISES • How to create a cross-tab (mind the row and column variable) • How to create a cross-tab with column-frequencies • How to create a cross-tab with row-frequencies • How to create a cross-tab with relative-frequencies