290 likes | 302 Views
This article explores the use of administrative records in official statistics, specifically focusing on the integration of survey and administrative record data to improve the balance of quality, cost, burden, and risk.
E N D
Two Approaches to the Use of Administrative Records to Reduce Respondent Burden and Data Collection Costs John L. Eltinge Office of Survey Methods Research U.S. Bureau of Labor Statistics 12th Meeting of the Group of Experts on Business Registers September 14-15, 2011
Acknowldegements and Disclaimer The author thanks Tony Barkume, Rick Clayton, Mike Davern, Bob Fay, Jenna Fulton, Gerry Gates, Pat Getz, Bill Iwig, Shelly Martinez, Bill Mockovak, Polly Phipps, John Ruser, and members of the FCSM Subcommittee on Administrative Records for many helpful discussions of the topics considered here. The views expressed here are those of the author and do not necessarily reflect the policies of the U.S. Bureau of Labor Statistics, nor the FCSM Subcommittee on Statistical Uses of Administrative Records
Overview I. Conceptual Background II. Two Approaches to Integration of Survey and Administrative Record Data A. Survey Core B. Administrative Record Core III. Methodological Issues IV. Empirical Issues V. Management Issues
I. Conceptual Background A. Primary Question For a specified resource base, can we improve the balance of quality/cost/burden/risk in official statistics by integrating survey and administrative record data?
I. Conceptual Background (continued) B. Possible Example: U.S. Consumer Expenditure Survey 1. Goal: Collect data on a wide range of consumer expenditures and related demographics 2. Current approach a. Household sample survey – complex design b. Personal visit and telephone collection
I. Conceptual Background (continued) 3. Issues re cost and perceived burden (60+ minutes average interview time; cognitive complexity) 4. BLS currently exploring a wide range of redesign options 5. Prospective use of administrative-record data, and its long-term impact on the balance of quality, cost, burden and risk?
I. Conceptual Background (continued) C. Possible Cases 1. Sales data from retailers, other sources - Aggregated across customers, by item - Possible basis for imputation of missing items or disaggregation of global reports 2. Collection of some data (with permission) through administrative records (e.g., grocery loyalty cards), linked with sample consumer units
I. Conceptual Background (continued) D. Framework: Population of Consumer Purchases Defined by Cross-Classification of: - Classification of product/service, time, geography - Characteristics of purchaser (Consumer? Demographics?) - Admin: Outlet, intermediaries (financial, other) E. How to modify estimation methods to incorporate administrative data? - Weighting and imputation for CPI cost weights, commonly produced tables - Construction of public-use datasets
II. Two Approaches to Integration of Survey and Administrative Record Data A. Survey Core 1. Relatively standard sample survey design a. Possible use of administrative record data for frames, selection probabilities, weights b. Primary data collection through standard survey methodology 2. Supplement survey data with administrative records a. Problematic variables (burden, data quality) Current Example: U.S. National Immunization Survey b. Quality checks (microdata or aggregate levels)
II. Two Approaches to Integration of Survey and Administrative Record Data (Continued) B. Administrative Record Core 1. Access administrative record data (at microdata or partially aggregated levels) 2. Per Lessler (2006), supplement as needed for inferential goals a. Fill in for incomplete population unit coverage b. Collect variables not captured in administrative records c. Adjust for data quality issues (e.g., timeliness or aggregation effects
II. Two Approaches to Integration of Survey and Administrative Record Data (Continued) C. Design Features 1. Generally differ substantially between the survey-core and administrative record core approaches 2. Need to consider both methodological and managerial components of design
III. Methodological Issues Comparison and contrast of the “Survey Core” and “Administrative Record Core” approaches will involve a wide range of methodological issues A. Methods for Evaluation of Properties of Prospective Administrative Record Sources 1. Population aggregates (means, totals) 2. Variable relationships (regression, GLM) 3. Cross-sectional and temporal stability of (1) and (2)
III. Methodological Issues (Continued) B. Methods for Integration for Sample Survey and Administrative Record Data: Adaptation of Methods from: 1. Partitioned designs (“multiple matrix sampling”) in education, health statistics 2. Multiple-frame designs (e.g., Lohr and Rao, 1999, 2003) - Frames may capture subpopulations through fundamentally different classification structures
III. Methodological Issues (Continued) C. Importance of Clarity on Sources of Variability Considered in Evaluation of Bias, Variance and Other Properties 1. Sources: Superpopulation effects Sample design (e.g., subsampling, matching) Unit, wave and item missingness or time lags Aggregation effects (temporal, cross-sectional) Reporting error (definitional, temporal, other) Imputation effects (including model lack of fit) 2. Conditioning and integration
III. Methodological Issues (Continued) D. Working Model for Methodological Properties X = Frame, weight information Y = Sample survey data Z = Additional administrative record data Properties of estimator based on variability from: 1. Population structure 2. Administrative and survey collection processes (“filters”) 3. Homogeneity of (1) and (2) across cases
III. Methodological Issues (Continued) E. Formal Evaluation of Properties Evaluate expected mean squared error with respect to each component in (D.1) and (D.2) Current information available at conceptual, empirical levels? Critical importance of understanding the underlying processes for collection and reporting of administrative data Ex: Propensity of a household or business to provide informed consent to link? Ex: Homogeneity of data quality characteristics over time?
III. Methodological Issues (Continued) F. Prior Literature (Examples) Davern (2007, 2009) Demers (2009) Federal Committee on Statistical Methodology (1980) Fulton et al. (2009) Herzog, Winkler and Scheuren (2007) Jabine and Scheuren (1985) Jeskanen-Sundstrom (2007) Ord and Iglarsh (2007) Penneck (2007) Royce (2007) Winkler (2009)
III. Methodological Issues (Continued) G. Prior literature: Two concepts of data quality 1. Per Davern (2007), extent usual ideas of “total survey error” (TSE) to administrative data: (Estimator) – (True value) = (frame error) + (sampling error) + (incomplete-data effects) + (measurement error) + (processing effects)
III. Methodological Issues (Continued) 2. Broader definitions of data quality, e.g., Brackstone (1999): Accuracy (all components of TSE) AND: Timeliness, Relevance, Interpretability, Accessibility, Coherence 3. Risk: Degradation in any component of data quality a. Aggregate risk: Historical focus of quantitative work b. Systemic risk: Often important for statistical programs - cf. “complex and tightly coupled systems” (Perrow, 1984, 2009)
III. Methodological Issues (Continued) H. Cost Structures 1. Statistical products (including surveys and administrative records) require substantial investments (often in intangibles) a. Data originators: - Initial administrative purpose - Accommodate statistical agency (data quality, learning curve, systems) b. Statistical agencies - Learning curves - Systems for acquisition, edit/imputation - Disclosure limitation
III. Methodological Issues (Continued) 2. Broad acknowledgement of substantial costs 3. Less empirical information generally available on: a. Relative magnitudes of specific cost components b. Extent of homogeneity of results from (a) with respect to: - Type of administrative/business organization - Type of administrative records - Subpopulation - Other factors
III. Methodological Issues (Continued) 4. Level of precision available on cost information: a. Purely qualtitative b. Order of magnitude c. Relatively precise 5. Practical uses of cost information a. Qualitative decisions among options b. Fine-tuning specific procedure 6. Sources of information (F. LaFlamme, 2008) a. Special studies (risks: Hawthorne, incomplete accounting) b. Cost-recovery contract accounting
III. Methodological Issues (Continued) I. Burden: 1. Respondent burden a. Elapsed time for collection, related activities b. Cognitive complexity c. Perceived sensitivity d. Informed consent - Direct access and linkage with survey - Obtained during original administrative- record work?
III. Methodological Issues (Continued) 2. Organizational burden a. Informed consent b. Record linkage c. Data management d. Data quality evaluation and adjustment
IV. Empirical Issues A. Properties of Input Data and Final Estimators B. Cost Structures 1. Obtaining Data: Contractual costs with provider Agency personnel (expertise) 2. Modification and maintenance of production systems C. Case studies are important, but may not allow inference to broader populations, variables
V. Managerial Issues A. Central Issue: Management of Costs and Risks - Methodological risks (commonly studied) - Operational risks (“execution risks”) B. Contractual Structure: 1. Performance Criteria and Incentives for Data Provider (Timely Delivery, Quality, Notice on Changes) 2. Stability of Prospective Sources (AOL in 1999, 2011) 3. Changes in Agency Requests (New Products, New Channels) C. Agency Personnel: Skills, Incentives and Institutional Culture
V. Managerial Issues (Continued) D. Contrast Between 1. Incremental risks (per standard statistical methodology) 2. Systemic risks cf. literature from Perrow (1984, 1999) and others on risks in “complex and tightly coupled systems”
VI. Closing Remarks A. Design Issues in Integration of Survey and Administrative Record Data B. Goal: Improve Balance Among Quality, Cost, Burden and Risk C. Distinction Between “Survey Core” and “Administrative Record Core” Approaches D. Impact on Methodological Design and Management Design E. Importance of Development of a Spectrum of Empirical Results
John L. EltingeAssociate CommissionerOffice of Survey Methods ResearchU.S. Bureau of Labor Statisticswww.bls.gov202-691-7404Eltinge.John@bls.gov