510 likes | 617 Views
Estimating Phone Service and Usage Percentages: How to Weight the Data from a Local, Dual-Frame Sample Survey of Cellphone and Landline Telephone Users in the United States. Presented at AAPOR 2009 Hollywood, FL May 14, 2009. Thomas M. Guterbock TomG@virignia.edu. The Problem.
E N D
Estimating Phone Service and Usage Percentages:How to Weight the Data from a Local, Dual-Frame Sample Surveyof Cellphone and Landline Telephone Users in the United States Presented at AAPOR 2009 Hollywood, FL May 14, 2009 Thomas M. Guterbock TomG@virignia.edu
The Problem • Dual-frame telephone surveys are becoming more prevalent in U.S. survey research • The rising percentages and distinctive demographics of cellphone-only [CPO] households make it imperative that sample designs cover them. • Landline RDD + Cellphone RDD sample frames • Result: sample data for 3 phone-service segments • CPO; overlap (dual-phone); landline-only [LLO] • Problem: what is the correct population distribution across 3 phone service segments?
National data? No problem • National Health Interview Survey [NHIS] data are the ‘gold standard’ • Uses a very large N, continuous sampling, in-person mode to establish household phone service. • NHIS provides fairly current data on cellphone coverage, percent CPO, phone segment distributions • NHIS data are available for the U.S. & for four census regions • State estimates released in 2009 using CPS + NHIS • SOLUTION: Weight phone-service segments in the national sample to NHIS percents for U.S.
What about local studies? • We cannot assume that the local phone-service segment distribution is the same as national or regional averages. • Cellphone penetration and CPO lifestyle adoption vary considerably across areas. • Cell penetration is higher in high density areas, metro areas, high-income areas, flat terrain, near interstates • CPO percentage varies with age, ethnicity, urbanicity, landline phone costs • NHIS: strong phone service variation across regions, states • Variation within states is probably similar in magnitude
Why not use percents from the local sample data? • In a local dual-frame sample, we will directly observe % CPO in the cell sample, % LLO in the landline sample. • But estimation from these observed percents is problematic for several reasons: • If we just combine the two samples, we overlook the fact that overlap households are double-sampled. • It’s not intuitively obvious how to calculate the percentages for the combined sample from the split sample results.
Why not use percents from the local sample data? • Cellphone-only cases are substantially overcounted in a cellphone sample. • CPOs have different telephone behaviors. More likely than dual-phone users . . . • To have phone with them • To have phone turned on • To accept calls from unknown numbers • Cellphone samples are usually kept small because of higher per-completion cost • So we can’t just add up the segment counts from the two samples.
Can we use the local sample data? • Collected data from the two realized, local samples surely contain useful information about local phone-service segments • Overcounts of CPO and LLO distort these data • We have to do the math correctly • IDEA: Estimate the amount of CPO and LLO overcount in national dual-frame studies, and then apply an adjustment to the local sample data to arrive at local estimates for %CPO and %LLO
Overview: A proposed solution • Develop algebraic solution for combining the two sample results from a dual-frame design into an overall phone service segment distribution, assuming equal response rates. • Develop algebraic solution for combining the two samples when response rates are NOT equal • higher response rates (overcounts) are assumed for CPO and LLO (compared to overlap) • Compare 2007 CHIS to 2007 NHIS (West region) to estimate ‘response rate ratios’ that correspond to the observed overcount • Apply these ratios to newly collected dual-frame survey data from three counties in Virginia • Result: plausible, locality-specific estimates of phone segments
Key assumptions • Local phone-service segment distributions vary • Forcing NHIS segment distributions onto local data would distort results • Response rate ratios (rates of overcount) are constant across surveys • If fielding and screening procedures are similar • Sampling variability is ignorable • In comparison of NHIS to CHIS • In projection from the local samples to local population
How to combine dual-frame sample results(equal response rates)
Cell phone samples include some that are also in the RDD frame Landline- only households are excluded 81.1% Cell phones (Frame 1)
RDD samples cover all landline households RDD (Frame 2) Cell-phone- only households are excluded 86.8%
RDD and Cell samples overlap,yield complete coverage a RDD LLO LANDLINE ONLY 18.9% PbT=.189 OVERLAP CELL + LANDLINE 67.9% PabT=.679 CPO CELL ONLY 13.2% PaT=.132 b These proportions define the population distribution of segments: ab Cell phones All percentages are from 2007 NHIS data (West region).
With equal response rates, cell sample would show: OVERLAP PabT=.679 a RDD LLO LANDLINE ONLY PbT=.189 CPO PaT=.132 81.1% OVERLAP as percent of Frame 1 Pab′ =.679/.811 =.837 CPO as percent of Frame 1 Pa′ =.132/.811 =.163 Cell phones All percentages are from 2007 NHIS data (West region).
With equal response rates,RDD sample would show: a 86.8% RDD LLO PbT=.189 OVERLAP PabT=.679 CPO PaT=.132 b OVERLAP as percent of Frame 2 Pab″=.679/.868 =.783 LLO as percent Of Frame 2 Pb″=.189/.868 =.218 Cell phones ab All percentages are from 2007 NHIS data (West region).
How do we get from observedpercentages to population percents?
Formulas for calculating underlying population distribution With PabT + PaT evaluated, we have: .
Combining dual-frame sample results when response rates are not equal
Three segments, four response rates RDD sample response rate for LLOs: rb a RDD Cell sample response rate for CPOs: ra b RDD sample response rate for overlap: rab″ Cell sample response rate for overlap: rab′ ab Cell phones
4 response rates,2 response rate ratios • Reduction in base response for dual-phone in the cell sample is: • This is the ‘response rate ratio’ that applies to the cellphone sample. • Reduction in base response for dual-phone in the RDD sample is: • This is the response rate ratio for the RDD sample.
It follows that . . . • And our expressions for calculating true population phone service segments are modified by incorporating the response rate ratios:
How to calculate response rate ratios • Now assume that we have observed results from a dual-frame phone survey. • We also know the true population distribution. • We can calculate the response rate ratios:
CHIS 2007California Health Interview Survey ≠16.3% ≠21.7%
From these data we can evaluate r1 and r2 In the cellphone sample, overlap response rate is only 37% of CPO rate. In the RDD sample, overlap response rate is about 60% of LLO rate. • Overcount of CPOs is greater than overcount of LLOs. • This shows: many dual-phone users still use cellphone • as a secondary device.
Calculating local area estimatesof population phone-servicesegment distributions
2008 Prince William County Survey • Citizen satisfaction survey in large, suburban county in Northern Virginia • N = 1,666 • Triple frame design: cellphone, landline RDD, and directory-listed sample • Here we combine the landline samples and treat as a dual-frame design • Screening questions patterned after those on CHIS
Apply formulas given above: Calculations based on: r1 = .368 r2 = .598
2008 Albemarle County Survey • Citizen satisfaction survey • Suburban and rural county surrounding City of Charlottesville, VA • Similar triple-frame design as in PWC survey • Smaller sample size: n = 700
2008 Chesterfield County Survey • Citizen satisfaction survey • Suburban county adjacent to Richmond, VA • Similar triple-frame design as in PWC survey • Treated as dual frame here • n = 1600
Using the estimated segment distribution to weight thesample data
Problem and solution • We don’t have ‘gold standard’ data by which to weight the results of a dual-frame telephone survey in a local area • Weighting to national or state averages might not be accurate • We developed needed formulas that relate observed percentages to underlying population phone segment distributions • We calculated ‘response rate ratios’ by comparing CHIS 2007 to regional NHIS 2007 results. • We applied these ratios to calculate underlying distributions in three local telephone surveys
Results • The estimates for three suburban counties in Virginia are quite different from national phone-segment distributions—and from each other • Cellphone penetration is higher in Northern Virginia than in downstate suburbs, or in national estimates • CPO lifestyle has been adopted by fewer people in the downstate suburbs • The estimates can guide weighting of sample data • But we must use caution in weighting our cellphone samples up too much • Larger cellphone samples needed in the future
Future research • This is a time of rapid change in the telephone system • We are just learning how to deal with the weighting issues in cellphone surveys • We need to look at optimization of our dual-frame designs (cf. Hartley 1962) • Estimates of response rate ratios can be updated using more current national phone surveys compared to NHIS • Results would be strengthened if external local data were available to validate the estimates