230 likes | 522 Views
Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27. Chapter 1 (and 7.8 for some reason) : Statistical Applications and Types of Data. A few definitions:.
E N D
Chris Morgan, MATH G160 csmorgan@purdue.edu April 6, 2012 Lecture 27 Chapter 1 (and 7.8 for some reason): Statistical Applications and Types of Data
A few definitions: Data – measurements from which information and knowledge are derived, facts and figures collected, analyzed, and summarized Data Set – a collection of data, usually put in table form Element – a single cell in a dataset Observation – a subject on which data is being collected, makes up the rows of a dataset Variable – any characteristic of an observation, makes up the columns of a dataset C. Morgan, STAT 225, Fall 2011
An example of a data set: C. Morgan, STAT 225, Fall 2011
Types of Data (part I): Quantitative (continuous) - can be measured (length, volume, weight, cost, etc) - intervals, ratios, percentages - differences in intervals do not have “natural” zeros - differences in ratios do have “natural” zeros Qualitative (categorical) - is observed not measured (beauty, taste, texture, smell, color, etc) - labels or names used to identify an attribute of each element - nominal: order does not matter (gender, religion, race) - ordinal: order does matter (class year, pain rating, salsa hotness) C. Morgan, STAT 225, Fall 2011
Quantitative - Height of wheat thin: 1’ ¼’’ - Weight of wheat thin: 1.06 oz - 22 servings per container - 11 wheat thins = 1 serving size Qualitative - Yellow Box and brown wheat thins - texture is smooth and yet slightly bumpy - Chris is obsessed with them - incredibly delicious!
What type of variable is… (qualitative or quantitative) - GPA - The amount of goodness in every wheat thin - Time it takes to run a mile - How many wheat thins I can stuff in my mouth at once - Smoking status - Income - The number of places you’d rather be than here C. Morgan, STAT 225, Fall 2011
Types of Data (part II): Cross-sectional data - observes many objects at one time - eg. How many wheat thins each of you can eat at once - eg. Number of people who fall asleep today in class - eg. The classes opinion on best ice cream flavor - eg. Your height today Time series - observes one subject or many subjects over time - eg. Average amount of wheat thins each of you can eat every week - eg. Number of students who fall asleep at least once this semester - eg. Student’s test scores over the semester - eg. Your height from age 7 - 22 C. Morgan, STAT 225, Fall 2011
Data Collection • Existing Sources • Surveys • Observational Studies • Experiments C. Morgan, STAT 225, Fall 2011
Existing Sources • Look at what others have already collected • many people and companies already have existing databases: • www.census.gov • www.swivel.com • www.who.org • www.cdc.gov Surveys • go out and ask people for their opinion • ask people for information C. Morgan, STAT 225, Fall 2011
Observational Studies - Watch subjects over time and record results - Comparing sales of different grocery stores in West Lafayette (simply observing their sales records and are not applying a treatment to any group) - Look up past data and analyze outcomes Experiments • design a study to answer specific questions • set up specific treatment to see if there are any outcomes • have a control group • random samples C. Morgan, STAT 225, Fall 2011
Statistical Inferences • Population: the set of all elements of interest in a particular study • Sample: a subset of the population • Census: the process of conducting a survey to collect data for the entire population • Sample Survey: the process of conducting a survey to collect data for a sample Why sample? Logistics, cost, limitations, etc… Statistical Inference: Using data from a sample to estimate the characteristic of a population C. Morgan, STAT 225, Fall 2011
Statistical Inference Example: • Take a census by counting the number of “e”s in the given paragraph. • Take a sample by randomly selecting a line and counting the number of “e”s and then multiplying by the number of lines in the paragraph. • How close are we?
Statistical Inference Example: Elegant, extravagant elephants entertain every evening at seven. They serve escargot and eggs benedict. Eight elderly elegant elephants elevate themselves to the expensive entrance with elevators exceeding expectations. Eating everything edible, elephants expand exponentially. “Excellent!” the entertained elephants express after the entertaining entrees were served. Everything was expedited by the energetic efforts of the executive elephant empress. Everyone was entertained to excess and enjoyed the edible endeavors immensely. The evening ended enchantedly with Echinacea herbal tea.
Statistical Inference Example: • Total “e” count: 126 • I randomly chose line #3 with an “e” count of 12 –12x12=144 • I randomly chose line #10 with an “e” count of 11 –11x12=132
Sampling Methods • Stratified Sampling • Cluster Sampling • Systematic Sampling • Convenience Sampling • Judgment Sampling C. Morgan, STAT 225, Fall 2011
Sampling Methods – Simple Random Sampling (SRS) • Finite population: A sample of size n from a finite population of size N is selected such that each possible sample of size n has the same probability of being selected. • Infinite population: A sample is selected from a population in such a way that each element has the same probability of being selected. • Sampling With Replacement: Elements are put back in the population after being selected for • Sampling Without Replacement: Elements are not replaced after being selected and are therefore only chosen once to be in a sample.
Sampling Methods – SRS example Say I want to take a sample of NFL football teams 1. make a list of all the teams 2. randomly select 8 teams without replacement: select one team at a time and then remove the chosen team from the list with replacement: select one team at a time, but do not remove the chosen team from the list
Stratified Sampling - Divides population into groups called strata - Takes a simple random sample (SRS) from each strata - Divide students into class year and take a random sample from each Cluster Sampling • divides population into groups called clusters • takes a SRS of clusters • each element in the group is a part of the sample Systematic Sampling • number the units in the population from 1 to N, decide on the n (sample size) that you want or need • set k = N/n first, one of the first k elements is selected • and then every kth element thereafter is selected. C. Morgan, STAT 225, Fall 2011
Convenience Sampling - Easiest sampling method, usually cheapest and easiest to implement - Fliers on campus for people to participate in surveys or other studies - choosing a random box of wheat thins to determine quality instead of sampling from twenty boxes - Not supported as a probability sample Judgement Sampling • not scientific at all • based on one sampler’s opinion • does this one sample (one observation in this case) represent the whole of the population? Why or why not? C. Morgan, STAT 225, Fall 2011
Sampling Methods: example • Convenience Sample: select subjects 1-4 • • Stratified Random Sample: divide the 20 subjects into 4 non-overlapping groups each has 5 subjects, choose 1 subject from every group • • Cluster Sample: divide the 20 subjects into 10 • non-overlapping groups each has 2 subjects, randomly choose 2 of these groups, those subject in the 2 chosen groups are selected in the sample • • Systematic Sample: Randomly choose 1 from the first 5 • subjects, for example 4, then choose 4, 9, 14, 19 in the sample C. Morgan, STAT 225, Fall 2011
Bias Bias is any deviation of your expected result of the survey from the true population Sources of bias include: - poorly worded questions - bad communication - sensitive questions that some may not want to answer, or answer incorrectly - entry error (human error) C. Morgan, STAT 225, Fall 2011
Avoiding Bias • Confusing wording? – If you have to read it more than once to understand what its saying • Asking something no one would remember? – What were you doing between 8 and 8:15 on Tuesday November 5th 2005 • Leading the question to a certain answer – Would you advocate a recycling plan that would help reduce landfill mass? – Would you pass a bill outlawing the shipment of oil from Alaska to Russia due to the large death rate of the baby seals? • Something really embarrassing that they wouldn’t answer honestly – Do you always wash your hands after using the restroom? – Have you ever cheated on a test? – Have you ever done drugs? • Date sensitive question – How safe do you feel at Purdue University (what if this was asked right after the Virginia Tech shootings?) C. Morgan, STAT 225, Fall 2011