1 / 9

Topics, Summer 2008

Topics, Summer 2008. Day 1. Introduction Day 2. Samples and populations Measures of central tendency and dispersion Evaluating differences between sample means to estimate differences between populations – normal distribution and t-test Day 3. Evaluating relationships Scatterplots

Download Presentation

Topics, Summer 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations • Measures of central tendency and dispersion • Evaluating differences between sample means to estimate differences between populations – normal distribution and t-test Day 3. Evaluating relationships • Scatterplots • Correlation Day 4. Regression and Analysis of Variance Day 5. Logistic regression

  2. Distributions for nominal variables • Counts (i.e., frequency) How many Xs do I have? • Proportions (i.e., probability density) How many Xs do I have out of the total number of observations? Example: • How many of the clauses tagged in the Switchboard portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient? • What proportion of the Switchboard observations …?

  3. Frequency, probability, odds Frequency and expectation: • Of the 17 students who received financial support to attend the LSA Summer Meeting, how many do we expect to be women? • If 7 were women, is this deviation from the expected value of 8.5 larger than we could expect by chance? Evaluating frequency differences: • Of the tagged clauses in the Switchboard portion of the Bresnan et al. (2007) dataset, 79% show the PP realization of the recipient. • Is the proportion of PP realizations the same in the Wall Street Journal portion of the dataset?

  4. Distributions for ratio variables • Raw counts of values not very useful How many Xs are equal to n1? How many Xs are more than n1 but less than n2? • Proportions What percentage of Xs such that n1 < x < n2? • Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm } What percentage of Xs such that x ≤ b1 ? What percentage of Xs such that b1 < x ≤ b2 ? … What percentage of Xs such that bm-1 < x ≤ bm ?

  5. Summary measures • Central tendency (expected value) • mode • median • mean • Dispersion (reliability of expectation) • range • inter-quartile range • variance • standard deviation

  6. Descriptive vs inferential statistics • descriptive statistics • summary of your sample • examples: • calculate sample mean (written “x-bar”) • calculate sample variance (s2) • inferential statistics • generalization from your sample to the population from which your sample was drawn • examples: • use x-bar to estimate population mean () • use s2 to estimate population variance (2)

  7. Distribution families • Uniform distribution Example: Expected value for throw of one die • Binomial distribution Example: Expected number of heads when n coins tossed • Normal distribution Example: Expected total value for throw of n=many dice Expected value for many variables that are the cumulative result of many independent influences

  8. Central Limit Theorem • Because the mean value of a large random sample is the cumulative result of many independent influences, the distribution of mean values of large random samples taken from a population will approximate a normal curve whatever the shape of the population distribution. • Example: • distribution of values in random throw of a die vs distribution of mean values calculated for a set of random throws of 10,000 dice

  9. Hypothesis testing • Null hypothesis (H0) • examples: • mean F4 for Detroit vowels is 3500 (written H0: m = 3500 Hz) • mean F4 of Detroit men’s vowels is 3500 • mean F4 of men’s vowel is same as mean F4 of women’s vowels • Alternative hypothesis • examples (matching those above): • mean F4 for Detroit vowels is not 3500 (written H0: m≠ 3500 Hz)

More Related