310 likes | 321 Views
The Past, Present, and Future of Social Surveys. Graham Kalton grahamkalton@westat.com. Context. A personal perspective on developments in survey research since the late 1950’s. Main focus: household surveys and survey statistics.
E N D
The Past, Present, and Future of Social Surveys Graham Kalton grahamkalton@westat.com
Context • A personal perspective on developments in survey research since the late 1950’s. • Main focus: household surveys and survey statistics. • To set the scene, I start with a brief review of the early history of survey sampling. • I then focus on the changes in the field that have taken place since the 1950’s. • I end with speculations about future directions. • “Those who don’t know history are doomed to repeat it.” ‒ Edmund Burke
Early History of Survey Sampling • Kaier (1895): Representative method. • Bowley (1913): Systematic sampling of buildings in Reading (and then four other towns). Also confidence intervals. • ISI (1926): Representative method: random or purposive selection. • Neyman (1934) “On the two different aspects of the representative method: the method of stratified sampling, and the method of purposive selection.”
Survey Research in the 1950’s and 1960’s • Major developments in survey sampling took place after Neyman’s paper leading to books by • Yates (1949), Deming (1950), Hansen, Hurwitz & Madow (1953), Cochran (1953), and Sukhatme (1954). • Probability sampling was well established for government surveys in the U.S. and U.K. in the 1950’s. • Survey research was much simpler: • Face-to-face interviewing in general • Response rates for uncomplicated surveys were generally as high as 85% to 90%. • Few mail surveys because of response rate concerns
Quota Sampling • Quota sampling was―and still is―widely used in market research (Stephan & McCarthy, 1958). • Quota controls are subgroups with known population sizes, such as age, sex and employment status. • Interviewers are instructed to obtain specified numbers of respondents in each subgroup. • Cost and speed benefits. • Sudman (1965) suggests biases of the order of 3%-5%. • Fit for purpose.
Drivers of Major Developments since the 1960’s • Arrival of computers, the rapid and continuing advances in computing power, and the development of computer programs. • New modes of data collection. • More complex data collections. • More sophisticated user community that wants more complex estimates. • Demand for surveys of specialized subpopulations. • Declining response rates. • Rising costs. • The need for model-dependent methods.
Computers and Data Collection • BC: • PAPI, with a simple questionnaire • AC: • CAPI: use of tablets for data collection • CARI for quality checking • CATI: use of laptops for telephone data collection • CASI and ACASI: Valuable for sensitive questions • Smartphones • Web • GPS for detecting fabrication
Sampling Developments within the Design-Based Framework: Two Examples 1. Variance estimation • BC: Simple Taylor series for means and proportions (Keyfitz, 1957; Kish, 1957); simple replication (Mahalanobis, 1946; Deming, 1960). • AC: Linear substitute extension of the Taylor series method; balanced repeated replications and jackknife repeated replications; bootstrap. 2. Analytic techniques • BC: Totals, means, proportions. • AC: A full range of techniques, including multiple regression, chi-square tests, multi-level modeling, Cox’s proportional hazard models.
Developments with a Model-Dependent Component: Compensating for Missing Data • Unit nonresponse and noncoverage • BC: Ignore (MCAR) or simple cell adjustments (MAR). Raking (Deming & Stephan, 1940). • AC: A wide range of methods including raking, propensity score weighting, calibration methods, use of CHAID, methods for MNAR situations. • Imputation • BC: Complete case analysis (MNAR). • Do not fabricate data • AC: Hot deck imputation, regression imputation methods, tree-cell imputation, cyclical imputation, multiple imputation, fractional imputation.
Developments with a Model-Dependent Component: Small Area Estimation (SAE) • Early example in Hansen, Hurwitz & Madow (1953). • With Fay & Herriott (1979), SAE began to advance. • Still a reluctance to accept model-dependent estimates. • Now SAE is well-established (Rao & Molina, 2015). • Hierarchical Bayesian models can be fitted with McMC methods with modern day computing power.
Nonprobability/Quasi-Probability Sample Designs Less costly: • Quota sampling • Random route sampling • WHO’s EPI methodology for childhood immunization • Web samples • Probability and nonprobability web panels. Hard-to-survey populations: • Location or venue-based sampling: men who have sex with men; illegal immigrants; homeless; nomads. • Snowball (chain-referral) sampling: IV drug users. • Respondent driven sampling: Heckathorn (1997).
Internet Data Collection • Nonprobability Internet surveys (Couper, 2000) • Web surveys as entertainment, with no claim of scientific validity • Unrestricted web surveys • Volunteer opt-in panels: Volunteers sign up and, for a fee, respond to surveys from time to time. • Web scraping from such sites as Facebook and Twitter, and from searches on Google. • Weighting adjustments are used to attempt to compensate for the very unrepresentative samples.
Administrative Data as a Substitute for Survey Data (Hand, 2018) • Attractions: • Less costly; data are available for all covered; data may be of higher quality and more current; may reflect what people do rather than what they say; they may provide tighter definitions. • Challenges: • Likely need extensive data cleaning; the records may not cover the full target population; and the quality of the record data may not be adequate. • When data are needed from more than one dataset: • Probabilistic record linkage methods may be needed. • Privacy and confidentiality concerns are severe. • Legislation may be needed.
Administrative Data Linked to Surveys (Citro, 2014) Combined with surveys, administrative data can: • Reduce respondent burden; • Provide longitudinal data for the time before and after the survey data collection; • Provide accurate record data; • Be used for nonresponse weighting adjustments; • Be used to provide auxiliary variables in small area estimation.
Classification of Survey Estimates • Descriptive estimates: • Totals, means and proportions • Analytic estimates: • Measures of association for evaluating cause-effect relationships, e.g., regression coefficients, differences in means • The role of the finite population concept is less obvious for analytic estimates.
Mixed Mode of Inference for Descriptive Statistics • Some dependence on models is now inevitable when analyzing survey data. • For descriptive estimation, I view the need to use models as a response to an injury: • As a crutch to be used only to the extent that the survey data cannot fully support the desired estimates. • Dependence on models has consequences for how the quality of survey estimates should be presented to users.
Describing the Uncertainty in Survey Estimates • With a perfectly executed probability sample, the precision of a survey estimate is measured by its estimated standard error or confidence interval. • The additional variance arising from nonresponse and noncoverage weighting adjustments can be captured by replicating the adjustments. • Currently this design-based approach is used. • However, under this approach the underlying MAR model is assumed true: • “All models are wrong, but some are useful.” ‒ George Box
Measuring Uncertainty in Model-Dependent Estimates • Nonresponse and noncoverage adjustments are imperfect • They may reduce but they do not eliminate bias. • Weights for nonprobability samples are often adjusted so that the weighted sample counts conform to some external totals: • The biases that remain can be large and they go unmeasured. • Can better measures of uncertainty be produced? Bayesian model averaging (Lohr & Brick, 2017)?
Transportability of Measures of Association • For many years, internal validity was the dominant focus for randomized experiments, clinical trials, observational studies, and evaluation studies: • Do the effects apply to the study subjects? • External validity (aka generalization or transportability to other populations and subpopulations) was a lesser concern. • Transportability may be a reasonable assumption― at least approximately so―in many medical studies, but less so for social investigations. • In recent years, greater attention has been given to estimating a population average treatment effect for a specific population and to subgroup estimates.
Two Large Cohort Studies • U.K. Biobank Study: 500,000 men and women aged 40-69 randomly recruited in certain areas of the U.K. between 2006 and 2010 with a 5% response rate. Electronic health records included in the follow-ups. • A “healthy volunteer” selection bias (Fry et al., 2017). • All-cause mortality half that of UK population. • Estimates of disease prevalence and incidence rates are not safely transportable to the general population (even with weighting adjustments). • With the large sample, many subpopulations can be studied. Which associations might be transportable? • The U.S. “All of Us” Program aims to recruit a million volunteers.
Regression Analysis of Survey Data • Should the complex sample design be taken into account in conducting regression analyses? • Should survey weights be used and how should the variances of regression coefficients be computed? • If the regression model were correct, standard methods in general statistics should be used. • Incorporating weights simply lowers the precision of the estimates. • Since no model is correct, weights may be incorporated to estimate the best fitting (incorrect) prediction model for the finite population. • Or rather, for the superpopulation of which the finite population is a random realization.
Current State of Survey Research + Ever increasing demand for social data for evidence-based policy making and for research. + An international survey research profession has emerged. + Many specialist journals and conferences • Statistics in Transition ‒ Falling response rates and increasing costs. • Adaptive designs and other methods have not had great success in counteracting these effects ‒ Competition from administrative and Internet data.
Where Now? • Administrative and other data sources cannot completely replace surveys: • Some data (opinions, leisure activities, etc.) can only be obtained in person. • Surveys collect many variables that can place the responses to a particular item in context. • Surveys can produce a time series of estimates based on a common measuring instrument, whereas the rules for administrative data can change over time. • Administrative data have their own sources of error. • Researchers using administrative data need to develop metrics for evaluating the error dimensions in a way similar to the total survey error model.
Nonprobability/Quasi-Probability Sampling • For cost and speed reasons, the use of these methods will likely grow, especially with Internet surveys and surveys of hard-to-survey populations. • If nonprobability or quasi-probability sampling methods is used, the data collector should provide data users with full details of the sampling method. • The reports of survey findings should be accompanied by warnings about the unknown biases in the estimates. • The user needs to be able to assess the fitness of use of the results for the intended purpose.
In Summary • I do not foresee household surveys being replaced by administrative data. Rather, I expect the demand for high-quality surveys to continue to grow. • The critical problem is declining response rates. • More research on the interface between design-based and model-dependent inference is needed. • For nonresponse • For nonprobability/quasi-probability samples • With the ease of conducting Internet data collections, survey researchers need to be prepared to make the case for high-quality surveys.
Concluding Remarks • The practice of survey research has shown great adaptability to changing circumstances over the past 60 years. • The profession is well equipped to adapt to the major changes now taking place.