310 likes | 325 Views
Sampling, Causal Research. Market Intelligence Julie Edell Britton Session 6 September 5, 2009. Today’s Agenda. Announcements Sampling Sampling Error Milan Food Case WSJ/Harris Interactive Survey Causal Research – Experiments Pre-experimental Designs True Experiments
E N D
Sampling, Causal Research Market IntelligenceJulie Edell Britton Session 6 September 5, 2009
Today’s Agenda • Announcements • Sampling • Sampling Error • Milan Food Case • WSJ/Harris Interactive Survey • Causal Research – Experiments • Pre-experimental Designs • True Experiments • Factorial Designs and Interaction Effects
Announcements • Lots to do between now and next class: • Midterm Exam – take online between Sunday 9/6 and Wednesday 9/16 at 8pm • Open book, open notes – 3 hours from opening exam to submitting it. • WEMBA A case – with your team – due on Sunday,9/20 by 8pm • WEMBA B case – with your team – due on Thursday, 9/24 by 8 pm • Prepare Entitle Direct case – but no slides
Announcements • WEMBA A case – What to submit: • Body – 1 page, single-spaced Executive Summary describing the key dicisions to be made and the information needed to make those decisions • Appendix A: Proposed sampling plan and Survey Mode • Appendix B: Proposed draft Questionnaire • Appendix C: Key dummy tables and how to turn the data into action • Everyone should comes to your team meeting with question ideas & draft items • Heed point distribution on grading to guide your time allocation across subtasks • Once WEMBA A is submitted, I will send all members of the team, WEMBA B. • WEMBA B - SUBMIT 5 slides -- 1 slide with your dummy tables & action standards for each of Dan Nagy’s 5 questions
Sampling Process • Define population • Elements, extent, time • Identify a good sampling frame • costly to create for yourself • Determine sample size • budget, accuracy needs • Select sampling procedure • way to select elements from the frame • Physically select sample
Probability Samples • Each element in population has known, nonzero chance of being sampled • Simple random sample: all elements have 1/n chance of being sampled (e.g., cold caller) • Systematic sample: start with randomly selected element and take every nth element (e.g., teams in this class) • Cluster sampling: pick groups of elements (city blocks, census tracts, schools) then randomly select n elements from each cluster • Stratified sampling: divide frame into strata according to a characteristic (e.g., gender), then sample randomly from each strata
Complex Sampling Procedures • Simple random sampling almost never used in practice • Stratified Sampling -- Lowers error • Cluster Sampling -- Lowers cost of getting frames and of data collection
Stratified Random Sample • Have frames sorted on some stratification variable believed to influence the variable you are estimating. • Lower variance within each subgroup than across population in general • By ensuring that each subgroup is represented in right mix, extreme overall means less likely -- i.e., smaller std. error.
Steps for Stratified Random Sample • Divide Population into mutually exclusive and exhaustive categories. • Decide what sampling fraction f = n / N to use. • Draw an independent simple random sample of size f * N(stratum) from each stratum. • Compute stratum mean for each • Estimate overall pop mean as weighted average of stratum means • Estimate SE as weighted combo of SEs in each
Cluster Sampling • Typically “clusters” are geographic territories. • Start with list of clusters, randomly select subset, and survey only subset. • Cheaper travel cost, cost per interview • Loss of effective sample size if people in cluster more alike than if in different cluster
Non-Probability Samples • Convenience • Judgment • Pick especially informative elements • Quota • Sample matches population on key control characteristics correlated with behavior under study. • Match only really matters for control variables related to thing you are trying to estimate.
Sampling Errors vs. Biases • Sampling Error: variation in estimates of a population parameter (e.g., awareness of X) due simply to variations among different random samples chosen by following the same basic procedure. • Sample Biases: Expected value of estimated population parameter differs from true value because of unwitting under-sampling or oversampling of certain types of sampling elements • Availability biases (1-900 polls, Web surveys) • Frame errors (Literary Digest)
Milan Foods • Purpose is to illustrate things about sampling • If you had the population data, you would use it rather than sample from it
Precision in Simple Random • Statistics review • Distribution of original scores • Mean = Y-bar • Variance -- Average squared deviation from mean • Standard Deviation -- Square root of variance • Distribution of means of samples of size n • SD of Y-bar distribution • Std Error = SD of pop. est. Square root (n)
Sampling Distribution of Means of Samples of Size n < N • Milan Foods (FoodExp$) • Population Mean = $43.30; SD = 20.91 • What about distribution of sample means for n < N? If sample size = 100, • Std Error = SD of means of 100-case samples in pop. = pop SD/sqrt(100) = $20.91/10 = $2.09 • 95% of all sample means of sample size 100 are within $43.30 +/- (1.96*2.09): $39.12 to 47.48.
In Milan Foods • Simple Random, SE (for n = 25) = 20.91/sqrt(25) = 4.18 • Simple Random, SE (for n = 100) = 20.91/10 = 2.09 (quad n to ½ SE) • Stratified on I (Any Kids 6-18), SE (for n=100) = 19.13/10=1.91
Precision of Estimate of Pop. Mean From Sample Mean • In practice, we don’t know pop. SD so we treat sample SD as our best guess • n = 100, sample mean = $42.41, SD = $18.34 • Est. Std. Error = $18.34 / sqrt(100) = $1.834 • 95% CI: $42.41 +/- (1.96*1.834) = ($38.74 to $46.08)
Same Thing, n = 25 • n = 25: sample mean = $45.10, SD = 18.13 • Treat as our best guesses of pop. parameters • Est. Std. Error = $18.13/Sqrt(25) = $3.26 • 95% CI: $45.10 +/- (1.96*3.26) = ($37.85 to $52.35) • Note the comparison of n=100 to n = 25 • n=100 ($38.74 to $46.08)
Wall Street Journal • Sampling Error • sample size per school too small for meaningful comparisons (n=20 to qualify). No evidence that those ranked 6th -50th differed significantly from each other in ratings. • Sample Bias • Sample of recruiters open to manipulation • Let respondents pick which of many schools they recruited they would focus on for ratings. Leads to further selection bias like 1-900 call in poll. • Sampling method underweights views of large recruiters who visit many campuses • Response rate not reported, but appears to be 7%.
Key Sampling Takeaways • Probability v Non-Probability Samples • For Probability Samples, Standard Error is the measure of precision • Precision increases with square root of N • More precision with Stratified if and only if stratifier is correlated with thing estimated • Same principal for Quota samples. Quotas only help if correlated with variable
Experiments • Best way to test causal hypotheses • Independent Variable = hypothesized cause • Manipulated by the researcher/manager • Example: Send a color or black and white brochure • Dependent Variable = effect • Measured (observed) by researcher/manager • Example: New accounts secured • Random assignment of subjects to conditions • Example: receive color or receive b&w brochure 26
Pre-experimental Designs • One group, after-only design • One group, before & after design • Unmatched control group design • Matched control group design • All have threats to validity not present in a true experiment with random assignment to treatments. 27
Validity • The strength of our conclusions • i.e., Is what we conclude from our experiment correct? • Threats to Validity • History: an event occurring around same time as treatment that has nothing to do with treatment • Maturation: people change pre to post • Testing: pretest causes change in response • Instrumentation: measures changed meaning • Statistical Regression: Original measure was due to a random peak (SI Cover Curse) or valley 28
One Group After Only We propose a change in MBA Core, to move Finance and Marketing up to Term 2 from their position in Term 3. One major motive for this is that students interview for internships in Term 3, and if they want jobs in marketing or finance, they have no background at the time of the interview. Thus, we perceive that we are at a competitive disadvantage because those courses are in Term 3. EG X O (Mean = 50%) X = Marketing Term 3, O = Did/Did Not Get Desired Internship Key: Lacks a baseline, so worthless. 29
One Group Pre-Post Design • Breckenridge Brewery wants to assess the efficacy of TV ad spots for its new amber ale. • Time 1 (O1): Duke undergrads are brought to the lab and asked to rate their frequency of buying a series of brands in various categories over the past week. The list includes Breckenridge Amber Ale. Mean = 0.2 packs per week. • Time 2 (X): Two weeks of ads for Breckenridge Ale. • Time 3 (O2): Same Duke undergrads brought back to lab to rate frequency of buying same set of brands over past week. Mean = 1.3 packs per week. • 1.3 - 0.2 = 1.1. We attribute an increase of 1.1 packs per week to the ad. 30
Takeaways for the Day • Probability v Non-Probability Samples • More precision with Stratified if and only if stratifier is correlated with thing estimated • Threats to validity in pre-experimental and quasi-experimental designs