Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information

Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information Donsig Jang, Xiaojing Lin, Amang Sukasih Mathematica Policy Research, Inc. Steve Cohen, Kelly Kang National Science Foundation ITSEW 2008 Research Triangle Park, NC, June 2, 2008

Disclaimer The opinions and assertions are those of the authors and do not reflect the views or policies of the National Science Foundation

Survey Data Collection • Involves many complex processes including • Sampling frame construction • Sample selection • Data collection • Data processing • Estimation • Each process subjects to error • Attempt to decompose the total survey errors into separate stages of processes

Parameter Sampling Frame Sample Respondent Data Estimator Total Survey Errors Misclassification error Coverage error Sampling error Nonresponse error Measurement error Estimation error

Misclassification Error in Stratification • Focus of this talk • A part of non-sampling error • Important but often overlooked component

Trade-off: cost to gather stratification information at the frame construction vs. optimal sample allocation • Loss of effective sample sizes for some analytic domains Stratification in Sampling • Enhance precision of survey estimates • Precision requirements for analytic domains • Often imperfect information on stratification variables • Misclassification in stratification • Trade-off: cost to gather stratification information at the frame construction vs. optimal sample allocation • Loss of effective sample sizes for some analytic domains

Misclassification Matrix True classification A Stratification classification A* the proportion of units classified as category jin true category k and

Measures for Misclassification Effects • Bias • Effective sample size change

Bias Due to Misclassification where = true population props. = Identity matrix = sample proportions s denotes sample, wi the sampling weight for unit i, and I(.) the indicator function Kuha and Skinner 1997

Bias Estimation If the true classification is available from the sample: where

Effective Sample Sizes and Variance Inflation Factors for domain d constructed based on true value for domain d constructed based on misclassified value • Measures the inflation of variance due to weight variation

Example: National Survey of Recent College Graduates (NSRCG) • Sponsored by National Science Foundation • Collecting education, employment, and demographic information from recent graduates with Bachelor’s or Master’s in science, engineering, or health fields • For details, • http://www.nsf.gov/statistics/srvyrecentgrads

NSRCG (Continued) • Two stage sample design: school sample at the first stage and graduate sample at the second stage • Crucial to collect key sampling variables (degree date, degree level, field of major, race/ethnicity, and gender) from schools for eligibility determination and stratification (frame variables) • Sample was designed to have moderate weight variation within domains while meeting certain sample size thresholds • Quality of sampling variables compromised due to schools’ reluctance to release the student’s information, non-standard formats used by schools, and inaccurate/incomplete administrative data Jang and Lin (2007 JSM)

NSRCG (Continued) • Same information (degree date, degree level, field of major, race/ethnicity, and gender) were also collected from sampled graduates • Able to measure the quality of school provided information for stratification by assessing discrepancies between school provided information and reported values • Looking at two survey data (2003 and 2006 NSRCG)

Misclassification for Gender NSRCG2003 NSRCG2006 ReBias for PMale= -0.01% ReBias for PMale = 0.50%

Misclassification for Race/Ethnicity NSRCG2003 NSRCG2006

Effective Sample Sizes and Variance Inflation Factors • What if taking reported values for discrepant cases? • Result in more weight variation within domains based on reported values due to unequal selection probabilities across classes • Check domain specific sample sizes and variance inflation factors

= White, = Asian, = Minority Variance Inflation Factors Domain: race/ethnicity by degree level by major field by gender NSRCG2003 NSRCG2006

= White, = Asian, = Minority Ratio of Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field by gender NSRCG2003 NSRCG2006

= White, = Asian, = Minority Ratio of Effective Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field by gender NSRCG2003 NSRCG2006

= White, = Asian, = Minority Variance Inflation Factors Domain: race/ethnicity by degree level by major field NSRCG2003 NSRCG2006

= White, = Asian, = Minority Ratio of Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field NSRCG2003 NSRCG2006

= White, = Asian, = Minority Ratio of Effective Sample Size, n_R / n_F Domain: race/ethnicity by degree level by major field NSRCG2003 NSRCG2006

= White, = Asian, = Minority Variance Inflation Factors Domain: race/ethnicity by gender NSRCG2003 NSRCG2006

= White, = Asian, = Minority Ratio of Sample Size, n_R / n_F Domain: race/ethnicity by gender NSRCG2003 NSRCG2006

= White, = Asian, = Minority Ratio of Effective Sample Size, n_R / n_F Domain: race/ethnicity by gender NSRCG2003 NSRCG2006

Summary • Misclassification in stratification may reduce the effective sample sizes for domains that were sampled with high sampling rates • Crucial to have good classification in stratification, especially with substantially unequal probability selections implemented

Next Steps • Population counts for key domains available but based on misclassification • Estimation of population counts: • Weighted sums of correct classification from the sample • Use of misclassification parameter estimates, where is the vector with population counts of domains defined by A* • Raking adjustments of the weights using • Comparison of key estimates

Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information

Assessment of Misclassification Error in Stratification Due to Incomplete Frame Information

Presentation Transcript

ASSESSMENT OF OCCUPATIONAL EXPOSURE DUE TO INTAKES OF RADIONUCLIDES

Lecture 7: Misclassification

Error rate due to noise

Error rate due to noise

Price of Anarchy in Games of Incomplete Information

Risk Assessment and Stratification

Querying Incomplete Geospatial Information in RDF

Analytical study of frame aggregation in error-prone channels

Incomplete asymmetric information

Stratification Case study to illustrate alternative methods to stratify a sampling frame

Monopoly with Incomplete Information

Static Games of Incomplete Information

Assessment of error due to orifice diameter mis-measurement

Workers’ Compensation Misclassification

Dynamic games of incomplete information

Assessment of error due to orifice diameter mis-measurement

REDUCTS IN INCOMPLETE INFORMATION SYSTEMS

Accuracy and Reliability Frame error Nonresponse error Specification error Measurement error

Planning with Incomplete, Unbounded Information

Static Games of Incomplete Information

ASSESSMENT OF OCCUPATIONAL EXPOSURE DUE TO INTAKE OF RADIONUCLIDES

Steps to fix mcafee installation incomplete error