1 / 27

The eternal tension in statistics...

The eternal tension in statistics. Between what you really really want ( the population ) but can never get to. So you have to make do (with the sample) you can estimate the population, make educated guesses,. but bottomline is “you can never have the population”.

edrennen
Download Presentation

The eternal tension in statistics...

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The eternal tension in statistics...

  2. Between what you really really want (the population) but can never get to...

  3. So you have to make do (with the sample) you can estimate the population, make educated guesses,

  4. but bottomline is “you can never have the population”

  5. An investigator usually wants to generalize about a class of individuals/things (the population)For example: in forecasting the results of elections, population = votersfor the Furniture.com class group: Population = all potential users

  6. Parameters: Usually there are some numerical facts about the population which you want to estimate • Statistic: You can do that by measuring the same aspect in the sample (Descriptive Statistics) • Depending on the accuracy of measurement, and representativeness of your sample, you can make inferences about the population (Inferential Statistics)

  7. One person’s sample is another person’s population • IS 271 students are a sample for the larger student population of UC Berkeley • IS271 students could be population for some other study

  8. The 1936 election: the literary digest poll • Candidates: Democrat FD Roosevelt and Republican Alfred Landon • The Literary Digest: had called the winner in every election since 1916 • Its prediction: Roosevelt will get 43% • polled 2.4 million people!

  9. The election results • The election result 62 • The Digest prediction 43 • Gallup’s prediction 44 of Digest Prediction • Gallups’s prediction 56 of election result

  10. Why the Digest went wrong: How they picked their sample • Selection Bias: A systematic tendency on the part of the sampling procedure to exclude one kind of person or another from sample • Sample Size: When a selection procedure is biased, making the sample larger does not help: repeats the mistake on a larger level

  11. How they picked their sample • Non Response Bias: Non respondents differ from respondents • they did not respond as compared to respondents who did! • Lower income and upper income people tend not to respond, so middle class over represented. • Non Response Bias: One can give more weightage to people who were available but hard to get.

  12. For Example: Predicting Elections • Non Voters: Gallup uses a few questions to predict if people will vote at all. Election forecast based only on those likely to vote. • Undecided: Asks people who they are leaning towards as of today. • Non Response Bias: One can give more weightage to people who were available but hard to get. • Ratio Estimation: Look at sample obtained, and compares it to population. If there are too many educated people weigh them lesser. • Interviewer Bias: Build redundancy into questionnaire to check for consistency. Also reinterview a small sample to check for consistency.

  13. Distribution of brown M&M’s Yellow 20% Brown 30% Orange 10% Blue 10% Red 20% Green 10%

  14. The distribution of the population

  15. Sample 1

  16. Sample 2

  17. Sample 3

  18. Population Sample 1 Sample2 Sample3 Sample3 5 Samples

  19. How much is each sample going to deviate from the population? (how big is the chance error for each sample likely to be?) Computation of Standard Error  number of samples x SD of sample 9, 7, 6, 9, 11, 12 Mean = 9 Standard Deviation = 2.2 Standard Error = 4.4

  20. Why is knowing the chance error important? • Allows us to estimate the accuracy of our estimates and is we are justified in using inferential statistics. • Allows us to make inferences about the population

  21. If there is a lot of spread in the samples, the SD is big and it will be hard to predict how accurate the sample will be. So the standard error will be big as well. Standard Deviation (SD) and Standard Error (SE): SD refers to a list of number. How far are most numbers from the mean? SE refers to the variability in samples. How variable is each sample going to be.

  22. Should the sample for Texas be larger than that for Rhode Island?

  23. Surprisingly: No Analogy: If you took a drop of liquid for analysis. If the liquid is well mixed, then it would not matter if the liquid was from a small or a large bottle, whether the sample is 1% or .1% of the population.. The statistical rationale: The accuracy of sampling is related to the standard deviation of the sample. Example: Election of 1992, % voters who chose Clinton 46% of voters in New Mexico, SD =.50 37% of voters in Texas =.48 Therefor accuracy of sample in Texas and New Mexico will be similar

  24. Types of Samples • The convenient sample:More convenient elementary units are chosen from a population. • The judgement sample:Units are chosen according to judgement made by someone who is familiar with the relevant characteristics of the population. • The random sample: Units are chosen randomly with a known probability.

  25. Quota Sampling: Each interviewer is assigned a fixed quota of subjects fitting certain demographic characteristics. Within the quota is a judgement sample. • Problems: quotas might not be representative, and judgement sampling is bad.

  26. Types of Random Sample • Simple Random Sample:Every unit of the population has an equal chance of being chosen. • A systematic random sample: One unit is chosen on a random basis, additional elementary units are taken from evenly spaced intervals until the desired number of units is obtained.

  27. The stratified random sample: Obtained by independently selecting a separate simple random sample from each population stratum. A population can be divided into different groups:based on some characteristic or variable like income of education. • The cluster sample:Obtained by selecting clusters from the population on the basis of simple random sampling. The sample comprises a census of each random cluster selected. For example, a cluster may be some thing like a village or a school, a state.

More Related