Topics, Summer 2008

Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations • Measures of central tendency and dispersion • Evaluating differences between sample means to estimate differences between populations – normal distribution and t-test Day 3. Evaluating relationships • Scatterplots • Correlation Day 4. Regression and Analysis of Variance Day 5. Logistic regression

Distributions for nominal variables • Counts (i.e., frequency) How many Xs do I have? • Proportions (i.e., probability density) How many Xs do I have out of the total number of observations? Example: • How many of the clauses tagged in the Switchboard portion of the Bresnan et al. (2007) dataset show the PP realization of the recipient? • What proportion of the Switchboard observations …?

Frequency, probability, odds Frequency and expectation: • Of the 17 students who received financial support to attend the LSA Summer Meeting, how many do we expect to be women? • If 7 were women, is this deviation from the expected value of 8.5 larger than we could expect by chance? Evaluating frequency differences: • Of the tagged clauses in the Switchboard portion of the Bresnan et al. (2007) dataset, 79% show the PP realization of the recipient. • Is the proportion of PP realizations the same in the Wall Street Journal portion of the dataset?

Distributions for ratio variables • Raw counts of values not very useful How many Xs are equal to n1? How many Xs are more than n1 but less than n2? • Proportions What percentage of Xs such that n1 < x < n2? • Histogram: X={x1, x2, …, xn}, breaks = {b1, b2, …, bm } What percentage of Xs such that x ≤ b1 ? What percentage of Xs such that b1 < x ≤ b2 ? … What percentage of Xs such that bm-1 < x ≤ bm ?

Summary measures • Central tendency (expected value) • mode • median • mean • Dispersion (reliability of expectation) • range • inter-quartile range • variance • standard deviation

Descriptive vs inferential statistics • descriptive statistics • summary of your sample • examples: • calculate sample mean (written “x-bar”) • calculate sample variance (s2) • inferential statistics • generalization from your sample to the population from which your sample was drawn • examples: • use x-bar to estimate population mean () • use s2 to estimate population variance (2)

Distribution families • Uniform distribution Example: Expected value for throw of one die • Binomial distribution Example: Expected number of heads when n coins tossed • Normal distribution Example: Expected total value for throw of n=many dice Expected value for many variables that are the cumulative result of many independent influences

Central Limit Theorem • Because the mean value of a large random sample is the cumulative result of many independent influences, the distribution of mean values of large random samples taken from a population will approximate a normal curve whatever the shape of the population distribution. • Example: • distribution of values in random throw of a die vs distribution of mean values calculated for a set of random throws of 10,000 dice

Hypothesis testing • Null hypothesis (H0) • examples: • mean F4 for Detroit vowels is 3500 (written H0: m = 3500 Hz) • mean F4 of Detroit men’s vowels is 3500 • mean F4 of men’s vowel is same as mean F4 of women’s vowels • Alternative hypothesis • examples (matching those above): • mean F4 for Detroit vowels is not 3500 (written H0: m≠ 3500 Hz)

Topics, Summer 2008