440 likes | 886 Views
AP Statistics– Unit 5: Producing Data Section 5.1 Designing Samples The “Big Picture” of Statistics—What is the Purpose? What is a statistical question you want answered? How do you collect the data to help answer your question? How do you analyze the data so that it relates to your inquiry?
E N D
AP Statistics– Unit 5: Producing Data • Section 5.1 Designing Samples • The “Big Picture” of Statistics—What is the Purpose? • What is a statistical question you want answered? • How do you collect the data to help answer your question? • How do you analyze the data so that it relates to your inquiry? • How do you make a conclusion based on the gathering and analysis of your data?
How can we gather information to answer a statistical question? Observational studies: Experiments:
How can we gather information to answer a statistical question? Observational studies: watching/observing individuals and recording observations of interest. Observers do not try to influence a response. Experiments:
How can we gather information to answer a statistical question? Observational studies: watching/observing individuals and recording observations of interest. Observers do not try to influence a response. Experiments: individuals are randomly assigned to groups where some treatment is imposed to determine cause and effect.
Who do we gather data from? Population: Sample:
Who do we gather data from? Population: entire group of individuals we are interested in. Parameters are measurements from a population. Sample:
Who do we gather data from? Population: entire group of individuals we are interested in. Parameters are measurements from a population. Sample: a subset of a population. Statistics are measurements from a sample.
In what way do we collect the data? Sampling: Census:
In what way do we collect the data? Sampling: asking, experimenting on, or observing a portion of the population. Census:
In what way do we collect the data? Sampling: asking, experimenting on, or observing a portion of the population. Census: experiment/observe each individual in the population.
In order to better understand the characteristics of a population, statisticians and researchers often use a sample from that population and make inference based on the summary results from the sample. Polling is an example of sampling from the population in order to get a better idea of the characteristics of a population. Because we make inferences about a population from the sample, it is very important that the sample is collected appropriately and that it is representative of the population being studied. There are several different sampling methods we can use to obtain a representative sample from a population. However, poor sampling methods can produce misleading conclusions.
Sampling Methods to Collect Data Convenience Sampling—Uses subjects that are readily available. Example: In order to get an idea of how students think of the new school policy, the principal stands outside the library and asks a few students their opinions. Advantages: Disadvantages:
Sampling Methods to Collect Data 2) Voluntary Response Sampling—Asample is obtained by allowing subjects to decide whether or not to response(also known as self-selected survey) Example: After the State of the Union speech, ABC tells its audience to call 1-800-555-1234 if they thought the speech was good and 1-800-555-7890 if they thought the speech was bad (there is a 50 cent charge for the call. Advantages: Disadvantages:
Sampling Methods to Collect Data 3) Simple Random Sample (SRS)—consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance of being the sample actually selected. This is often the best and most appropriate way to collect data for a sample. Example: In order to determine how happy students are with their education at BCHS, the principal assigns each student a number from 1 to 1653 (the number of students at the school) and then uses a random number generator to choose 100 numbers between 1 to 1653. She then surveys all the students with the chosen numbers.
Sampling Methods to Collect Data 3) Simple Random Sample (SRS)— Advantages: Disadvantages:
Sampling Methods to Collect Data 4) Stratified Sampling—Divide the population into groups of similar individuals (strata) then select an SRS within each strata. Combine the SRS from each strata to form your full sample. Example: In order to get a better idea of what BCHS athletes thought about homecoming last year, the director divides all BCHS athletes into the teams they play for, and then selects a random sample from each sports team. His full sample consists of combining the random samples from each team.
Sampling Methods to Collect Data 4) Stratified Sampling— Advantages: Disadvantages:
Sampling Methods to Collect Data 5) Cluster Sampling—Divide the population into sections (clusters) then randomly choose a few of those clusters, and select every member of the clusters chosen. Example: A school counselor collects a sample by first dividing up the students into their respective classes (sr, jr, soph, fresh), and then she selects two classes at random and surveys every student within those chosen classes. Advantages: Disadvantages:
Sampling Methods to Collect Data 6) Systematic Random-- Randomly select an arbitrary starting point from a group, and then select every kth member of the population. Example:HP selects every 200th computer off the assembly line and inspects it for quality control. Advantages: Disadvantages:
More About Simple Random Samples… There are several ways we could generate a completely random sample. For example, we could throw people’s names in a hat, and then draw then out at random. Since that is not very practical when we have a large number to choose from, we have a couple of other ways to do it: Random Number Generator on the calculator. 2. Random Number Table (Table B in your text).
Example 1: You have 10 marbles. You can’t decide which your favorites are so you just decide to select 5 at random. You decide to use a random number table to do this. First, you numbered each marble from 0 to 9 (with no repeats). Use the first line of the random number table to randomly select 5 marbles.
49563 12872 14063 93104 78483 72717 68714 18048 25005 04151 64208 48237 41701 73117 33242 42314 83049 21933 92813 04763 51486 72875 38605 29341 80749 80151 33835 52602 79147 08868 99756 26360 64516 17971 48478 09610 04638 17141 09227 10606 71325 55217 13015 72907 00431 45117 33827 92873 02953 85474
Example 2: There are 23 students in an AP Statistics class. Each student is assigned a number from 1 to 23 (with no repeats). Use the second line of the random number table to randomly select 5 students from this class.
Steps to Follow When Choosing a SRS: 1. 2. 3. 4.
100 subjects are randomly assigned to two types of diet pills. 50 people are given type A pill and a strict diet to maintain. 50 people are given type B pill with no specific diet instructions. In a second study, people responded to a questionnaire asking about the average hours they exercised and the number of pounds they lost. Which of the following statements is true? The first was an experiment, while the second was an observational study. The first was an observational study, while the second was a controlled experiment. Both studies are experiments. Both studies are observational studies. None of the above statements are true.
Which of the following are true statements? A census aims to obtain information about an entire population by studying a small sample of the population. Sample surveys are experiments. A treatment is imposed in an observational study. I only II only III only II and III None of the above.
A large university is considering introducing a new major in Economic Geography and wishes to poll the current student body for their opinion of the feasibility of introducing such a major. The Office of Public Relations mails a questionnaire on this issue to a SRS of 2000 students currently enrolled in the university. Of the 2000 questionnaires mailed, 532 have been returned of which 219 students support the new major. Which of the following represents the population for this study? The 2000 students receiving the questionnaire The 532 students who responded The 219 students who support the new major The 2000 students selected to represent a sample of the population of all currently enrolled students All students who are currently enrolled and all past alumni of the university
Creating a sample of students by first dividing the entire student body into four groups: seniors, juniors, sophomores, and freshmen, and then selecting a simple random sample (SRS) from each of the four groups is an example of Random sampling Cluster sampling Stratified sampling Systematic sampling Convenience sampling
A market research firm is hired by a nationally known cosmetics company to test new formulations of moisturizer. Using their extensive list of possible subjects, the market research firm first divides the subjects into 5 age groups and then randomly selected names from each age group to participate in the study. This is an example of A simple random sample Stratified sampling Cluster sampling Convenience sampling Systematic sampling
Creating a sample of students by starting with the second name in the student directory and selecting every 15th name after that best describes Random sampling Cluster sampling Stratified sampling Systematic sampling Convenience sampling
A researcher plans a study to examine the attitudes of residents of California towards a proposal in Congress to declare English to be the official language of the state. He obtains a random sample of 50 residents of one community in California and all agree to participate. Which of the following statements is true? This is a poorly designed survey because it is a voluntary response sample. The design of the study may be biased because the sample may not represent the population of interest. It is a well-designed survey because of the 100% response rate. As long as the respondents were randomly selected, there is no bias. A more accurately designed study would have included opinions on this issue from residents in other states.
Sources of Bias • Samples are biased if they are systematically not representative of the desired population. • Undercoverage: occurs when some groups in the population are left out of the process of choosing a sample. • Example: Because they are generally fearful of government intrusion, many immigrants from Latin America did not return their census questionnaire during the 1990 census.
Sources of Bias • Samples are biased if they are systematically not representative of the desired population. • Non-Response: occurs when an individual chosen for a sample can’t be contacted or refuses to respond. Non-response is a big problem in mail surveys. • Example: The BCHS administration sends out 100 survey questions to a sample of BCHS parents in order to gauge their attitudes toward the school. Only 23 surveys are returned. We have a non-response rate of 77%.
Sources of Bias • Response Bias: Response bias is caused by the behavior of the respondent or the interviewer. • Untruthful answers: people give untruthful answers for several reasons: • Sensitive questions • How often do you cheat on your spouse? • Socially acceptable answers • Do you use corporal punishment with your children? • Telling the interviewer what he or she wants to hear • One year after the Detroit race riots of 1967, interviewers asked a sample of black residents in Detroit if they felt they could trust most white people, some white people, or none at all. When the interviewer was white, 35% answered “most”; when the interviewer was black, 7% answered “most”. • The fix: secret ballots, anonymous surveys, “sensitive question” techniques.
Sources of Bias • Response bias is caused by the behavior of the respondent or the interviewer. • Ignorant people: people will give silly answers just so that they won’t appear like they know nothing about the subject. • In a study educators were asked how they would rank Princeton’s undergraduates business program. In every case, it was rated among the top departments in the country, even though Princeton doesn’t offer an undergraduate business major.
Sources of Bias • Response bias is caused by the behavior of the respondent or the interviewer. • Lack of memory: giving a wrong answer simply because the respondent doesn’t remember the correct answer. • Students were asked to report their grade point averages. Researchers then determined actual GPA’s. Over 17% of the students reported a GPA that was 0.4 or more above their actual average, and about 2% reported a GPA more than 0.4 below their actual GPA.
Sources of Bias • Response bias is caused by the behavior of the respondent or the interviewer. • Timing: When a survey is taken can have an impact on the answers. • In January, the National Football League reported a poll that revealed football as the nation’s favorite sport (this is at the time of the Super Bowl).
Sources of Bias • Response bias is caused by the behavior of the respondent or the interviewer. • Phrasing of a question: Subtle differences in phrasing make large differences in the results. • Should the president have the line-item veto to eliminate waste? 97% said “yes” • Should the president have the line item veto? 57% said “yes” • Sampling Error: The difference between a sample result and the true population result. This error results from chance variation.