230 likes | 343 Views
STA 291 Fall 2009. Lecture 2 Dustin Lueker. Basic Terminology. P arameter Numerical characteristic of the p opulation Calculated using the whole p opulation S tatistic Numerical characteristic of the s ample Calculated using the s ample. Simple Random Sampling (SRS).
E N D
STA 291Fall 2009 Lecture 2 Dustin Lueker
Basic Terminology • Parameter • Numerical characteristic of the population • Calculated using the whole population • Statistic • Numerical characteristic of the sample • Calculated using the sample STA 291 Fall 2009 Lecture 2
Simple Random Sampling (SRS) • Each possible sample has the same probability of being selected • The sample size is usually denoted by n STA 291 Fall 2009 Lecture 2
Example of SRS • Population of 4 students: Alf, Buford, Charlie, Dixie • Select a SRS of size n = 2 to ask them about their smoking habits • 6 possible samples of size 2 • A,B • A,C • A,D • B,C • B,D • C,D STA 291 Fall 2009 Lecture 2
How to choose a SRS? • Each of the size possible samples has to have the same probability of being selected • How could we do this? • Roll a die • Random number generator STA 291 Fall 2009 Lecture 2
Common Problems when Sampling • Convenience sample • Selecting subjects that are easily accessible to you • Volunteer sample • Selecting the first two subjects who volunteer to take the survey • What are the problems with these samples? • Proper representation of the population • Bias • Examples • Mall interview • Street corner interview STA 291 Fall 2009 Lecture 2
Example • A survey of 300 random individuals was conducted in Louisville that revealed that President Obama had an approval rating of 67%. • Is 67% a statistic or parameter? • The surveyors stated that only 67% of Kentuckians approved of President Obama. • What is the problem with this statement? • Why might the surveyors have chosen Louisville as their sampling location? STA 291 Fall 2009 Lecture 2
Famous Example • 1936 presidential election of Alfred Landon vs. Franklin Roosevelt • Literary Digest sent out over 10 million questionaires in the mail to predict the election outcome • What type of sample is this? • 2 million responses predicted an landslide victory for Alfred Landon • George Gallup used a much small random sample and predicted a clear victory for FDR • FDR won with 62% of the vote STA 291 Fall 2009 Lecture 2
Other Examples • TV, radio call-in polls • “should the UN headquarters continue to be located in the United States?” • ABC poll with 186,000 callers: 67% no • Scientific random sample of 500: 28% no • Which sample is more trust worthy? • Would any of you call in to give you opinion? Why or why not? STA 291 Fall 2009 Lecture 2
Other Examples • Another advantage of random samples • Inferential statistical methods can be applied to state that “the true percentage of all Americans who want the UN headquarters out of the United States is between 24% and 32%” • These methods cannot be applied to volunteer sample STA 291 Fall 2009 Lecture 2
Don’t Trust Bad Samples • Whenever you see results from a poll, check whether they come from a random sample • Preferably, it should be stated • Who sponsored and conducted the poll? • How were the questions worded? • How was the sample selected? • How large was it? • If not, the results may not be trustworthy STA 291 Fall 2009 Lecture 2
Question Wording • Kalton et al. (1978), England • Two groups get questions with slightly different wording • Group 1 • “Are you in favor of giving special priority to buses in the rush hour or not ?” • Group 2 • “Are you in favor of giving special priority to buses in the rush hour or should cars have just as much priority as buses ?” STA 291 Fall 2009 Lecture 2
Question Wording • Result: Proportion of people saying that priority should be given to buses. STA 291 Fall 2009 Lecture 2
Question Order • Two questions asked in different order during the cold war • (1)“Do you think the U.S. should let Russian newspaper reporters come here and send back whatever they want?” • (2)“Do you think Russia should let American newspaper reporters come in and send back whatever they want?” • When question (1) was asked first, 36% answered “Yes” • When question (2) was asked first, 73% answered “Yes” to question (1) STA 291 Fall 2009 Lecture 2
‘Flavors’ of Statistics • Descriptive Statistics • Summarizing the information in a collection of data • Inferential Statistics • Using information from a sample to make conclusions/predictions about the population STA 291 Fall 2009 Lecture 2
Example • 71% of individuals surveyed believed that the Kentucky Football team will return to a bowl game in 2009 • Is 71% an example of descriptive or inferential statistics? • From the same sample it is concluded that at least 85% of Kentucky Football fans approve of Coach Brooks’ job here at UK • Is 85% an example of descriptive or inferential statistics? STA 291 Fall 2009 Lecture 2
Qualitative Variables • Nominal • Gender, nationality, hair color, state of residence • Nominal variables have a scale of unordered categories • It does not make sense to say, for example, that green hair is greater/higher/better than orange hair • Ordinal • Disease status, company rating, grade in STA 291 • Ordinal variables have a scale of ordered categories, they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) • One unit can have more of a certain property than does another unit STA 291 Fall 2009 Lecture 2
Quantitative Variables • Quantitative • Age, income, height • Quantitative variables are measured numerically, that is, for each subject a number is observed • The scale for quantitative variables is called interval scale STA 291 Fall 2009 Lecture 2
Example • A survey of Kentucky Football fans obtained the following information • Age • Whether they preferred the new blue helmet or the old white helmet • The number of games they think the team will win in 2009 • How they felt the UK vs. U of L game would turn out • U of L in a blowout • U of L in a close game • UK in a close game • UK in a blowout • Are these qualitative or quantitative variables and what is the scale for each? STA 291 Fall 2009 Lecture 2
Discrete and Continuous • A variable is discrete if it can take on a finite number of values • Gender • Favorite MLB team • Qualitative variables are discrete • Continuous variables can take an infinite continuum of possible real number values • Time spent studying for STA 291 per day • 27 minutes • 27.487 minutes • 27.48682 minutes • Can be subdivided into more accurate values • Therefore continuous STA 291 Fall 2009 Lecture 2
Observational Study • An observational study observes individuals and measures variables of interest but does not attempt to influence the responses • Purpose of an observational study is to describe/compare groups or situations • Example: Select a sample of men and women and ask whether he/she has taken aspirin regularly over the past 2 years, and whether he/she had suffered a heart attack over the same period STA 291 Fall 2009 Lecture 2
Experiment • An experiment deliberately imposes some treatment on individuals in order to observe their responses • Purpose of an experiment is to study whether the treatment causes a change in the response • Example: Randomly select men and women, divide the sample into two groups. One group would take aspirin daily, the other would not. After 2 years, determine for each group the proportion of people who had suffered a heart attack. STA 291 Fall 2009 Lecture 2
Which is Preferred? • Observational Studies • Passive data collection • We observe, record, or measure, but don’t interfere • Experiments • Active data production • Actively intervene by imposing some treatment in order to see what happens • Experiments are preferable if they are possible • We are able to control more things and be sure our data isn’t tainted STA 291 Fall 2009 Lecture 2