Economics 105: Statistics

Economics 105: Statistics Review #1 due next Tuesday in class. Go over GH 7 & 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab still this week. It is due 2 weeks after your lab (so you’ll have 2 labs due that week, assuming you don’t complete it ahead of time)

Sampling Start with the Population, which is the set of all possible persons, firms, countries, etc. for the particular frame of reference For each research question, define the relevant population: what is the average income in the United States? what is the average height in Krakozhia? who will win the presidential election in November? what is the average number of volunteer hours per student? what percent of left-handed people have blue eyes? A sample is the subset of the population selected for analysis Must be representative of the population to avoid biased estimates U.S. census taken every 10 years, according to the Constitution First one in 1790 (3.9 million residents; today 312 million) http://www.archives.gov/exhibits/charters/constitution_transcript.htmlhttp://www.ipums.org/? http://usa.ipums.org/usa-action/variables/group

Simple Random Sampling Most straightforward way to achieve representativeness is Simple Random Sampling where each person has an equal, and independent, chance of being selected Also called i.i.d. sampling for independent and identically distributed (since drawn from same population) Say we want to know how many magazines a household currently purchases. choose 1000 names from ________? Suppose it is a good idea, now we contact them … if they’re not available, we just scratch them from our list. Or we go to the next name on the list until we find someone who is available. Any problems?

Systematic Sample Partition the population into n groups with k members each (k = N/n) Randomly choose one from the first group of k Take every kth item after that Faster and easier than simple random sample Telephone book, class roster, items from an assembly line, etc. Greater chance of selection bias if there’s a pattern in the population

Stratified Random Sampling Hypothetical research question: What % of students will vote in the election? Only have time & money to survey 100 students. You do so, but get only 2 political science majors in your sample. Problems? Solution: Stratified Random Sampling If a subgroup, or strata, of the population is particularly relevant to the research question, one may break the population down into strata and take a simple random sample from each strata Each person can only belong to one strata Ensures reasonable sample size of the subpopulation of interest or concern Can stratify on > 1 characteristic -- major and gender

Cluster Sampling Hypothetical survey of rural families spread over a wide area Hypothetical survey of homeless individuals in a large city Problems? Accurate list of population members In-person interviews too costly Mail surveys might lead to really high non-response Solution: Cluster Sampling Divide the population into geographically small units, or clusters For example, political wards or residential blocks for a city Then take a simple random sample of clusters Each person or household in a chosen cluster is then contacted, that is, a complete census of chosen clusters

Sources of Error from a Survey Sampling Errors come from having info on only a subset of population statistical theory is used to quantify Non-sampling Errors can occur even with a complete census of the population possible sources: Population sampled is not relevant one or list is incomplete (coverage error, sample selection bias) Measurement error Inaccurate or dishonest answers Halo effect Poor wording of questions Non-response (to whole survey or some questions) try to minimize at outset & check up on some answers

Sample Statistics Population parameterSample statistic

Sample Statistics Denote an i.i.d. sample by X1, X2, X3, . . . ,Xn What exactly is an Xi ? Actual outcomes are x1, x2, x3, . . . , xn How many samples could we take? How many samples do we actually take? A sample statisticis formed by taking some function of the random variables X1, X2, X3, . . . ,Xn, A = f(X1, X2, X3, . . . ,Xn) Examples The point estimate of the population parameter is a single number rather than a range

Sampling Distribution Sample statistic A is a random variable! Why? Thus, a sample statistic has a probability distribution, known as a sampling distribution Example: Let S = {0,1,2,3,4,5,6} Graph the sampling distribution of for n = 2

Central Limit Theorem Rough statement of CLT: “Sample means are eventually, approximately normally distributed.” Formal statement of CLT: Let X1, X2, X3, . . . ,Xn, where Xiis a random variable denoting the outcome of the ith observation, be an i.i.d. sample from ANY population distribution with mean and variance then as n becomes large Graphically (page 236 in BLK, 10th edition, has a nice visual)

CLT!

Economics 105: Statistics