1 / 21

MPP Stats Bootcamp W (12:00-1:50), Week 7

MPP Stats Bootcamp W (12:00-1:50), Week 7. Instructor: Dr. Alison Johnston ( Alison.Johnston@oregonstate.edu ) http:// oregonstate.edu/cla/polisci/alison-johnston SOC 516 Class Tutor: Daniel Hauser ( hauserd@onid.orst.edu ). Bootcamp Outline. Week 7: Descriptive Statistics

lorna
Download Presentation

MPP Stats Bootcamp W (12:00-1:50), Week 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPP Stats BootcampW (12:00-1:50), Week 7 Instructor: Dr. Alison Johnston (Alison.Johnston@oregonstate.edu) http://oregonstate.edu/cla/polisci/alison-johnston SOC 516 Class Tutor: Daniel Hauser (hauserd@onid.orst.edu)

  2. Bootcamp Outline • Week 7: Descriptive Statistics • Means, medians, standard deviations, and standard errors • Cross tabs/Contingency tables

  3. Samples and Statistical Inference • Recall from last week that we use (random) samples to make statistical inferences about a population • The most basic statistical inferences one can make about a population relates to its descriptive statistics • Mean • Median • Variance • Standard Deviation • Standard Error (Sample only)

  4. Descriptive Statistics: The Explanation • The mean of a sample is the average • For a sample that is normally distributed, the mean equals the median • If a sample is skewed, due to the presence of an outlier, the mean will gravitate closer to the outlier than the median • Outliers can distort estimators’ capacity to estimate population parameters (i.e. means, beta coefficients, etc.) • The mean of a sample is an unbiased estimator if it is equal to the population mean

  5. Means, Medians and Normal Distributions

  6. Means, Medians and Skewed Distributions

  7. Descriptive Statistics: The Explanation • Sample means on their own are unhelpful as estimators if we have little idea about the spread of our data • The variance/SD/SE of a sample indicates its spread/variability around the mean (i.e. the degree of uncertainty with the outcome) • Estimators are more accurate (i.e. have a high probability that they will be close to the estimated population mean) if the variance/SD/SE is SMALL

  8. Standard Deviations and Spreads

  9. Standard Errors vs Standard Deviations • Though standard errors are a “type” of standard deviations, they are NOT equivalent • “The standard error is the standard deviation of the sampling distribution of a statistic” • Again, statistical inference is the use of collecting sample estimators (i.e. a mean) to guess what the equivalent population parameter is • BUT sample means can vary from sample to sample. This variation in the sample mean across samples is called the sampling distribution • It is of THIS sampling distribution that the standard error represents a standard deviation of! • Measures the precision of our sample mean

  10. Descriptive Statistics: The Math…

  11. The Law of Large Numbers • Statistical inference is the use of sample data to estimate population parameters • Two desired features • Estimators should be unbiased (i.e. close to parameter value) • Estimators should be accurate (i.e. low standard deviation) • How can we achieve these properties simultaneously? • Increase sample size! • LLN: Themean of the results obtained from a large number of trials (samples) should be close to the population mean (i.e. become unbiased), and will approach the population mean as they increase

  12. STATA LAB EXERCISES • How to calculate the mean of a variable • How to calculate the median of a variable • How to calculate the variance, standard deviation and standard error of a variable • Examining how a larger sample size influences our standard error

  13. Problems with descriptive statistics and categorical data • Descriptive statistics a frequently used on data that is of interval scale • Numerical values that mean something • If our data is nominal (i.e. categories whose values cannot be ranked), means and measures of variance will be relatively meaningless • Cross tabs to save the day!

  14. Cross-Tabs/Contingency Tables • A cross tab is a frequency table (2x2 matrix) for two and ONLY two variables (i.e. how many observations lies within a particular category) • Dependent variable: The row variable • Independent variable: The column variable • The variables CAN NOT BE INTERVAL DATA (i.e. they MUST be categories – either nominal or ordinal data) • Cross-tabs provide a summary for how observations are distributed (i.e. their frequency) between the categories of two variables

  15. Cross-Tabs/Contingency Tables: How they work • Say we are conducting a fitness survey across three states and want to determine whether people in some states are more likely to work out than those in others • Dependent (Row) variable: Do you work out? (Y/N) • Independent (Column) variable: In what state do you live? • How do we calculate a cross tab? • Make your 2x2 matrix first • Then count… • Thank goodness for the “tabulate” command in STATA…

  16. Cross-Tabs/Contingency Tables

  17. Cross-Tabs and (percentage) Frequencies • We can also use cross-tabs to create the following (percentage) frequencies • Within column frequencies: What percentage of observations for a given DV category stem from each IV category • Within row frequencies: What percentage of observations for a given IV category stem from each DV category • Relative frequencies: What percentage of TOTAL observations lie within a particular cell? • (Percentage) frequencies are helpful when wanting to understand how observations are distributed • Is one cell over-representative of the sample?

  18. Within Column Frequencies

  19. Within Row Frequencies

  20. Relative Frequencies

  21. STATA LAB EXERCISES • How to create a cross-tab (mind the row and column variable) • How to create a cross-tab with column-frequencies • How to create a cross-tab with row-frequencies • How to create a cross-tab with relative-frequencies

More Related