250 likes | 397 Views
29 th International Traffic Records Forum. Using Multiple Imputation to Resolve the Missing Data Issue.
E N D
29th InternationalTraffic Records Forum Using Multiple Imputation to Resolve the Missing Data Issue
“There are known “knowns” (i.e., there are things we know we know). There are known “unknowns” (i.e., there are things we know we don’t know) and there are unknown “unknowns” (i.e., there are things we don’t know we don’t know).” Donald Rumsfeld
Some Known Knowns • How many crash records there are • How records are in the files we are matching
Some Known Unknowns • How many crash records should there be? • How many people are hospitalized for motor vehicle crashes? • How many people are transported by EMS for motor vehicle crashes?
An Unknown Unknown • What is the effect of missing data and missing links on the analysis?
Conditions for Multiple Imputation • Data must be “missing at random.” • The model used to generate the imputed values must be “correct.” • The analytic model must match up with the with the model used in the imputation.
Missing Completely at Random (MCAR) • The missing data are simply a random sample of all missing values. • For example, in a data set of crash records, safety belt usage would be MCAR if people who had safety belt usage reported, on average, had the same level of safety belt usage as people for whom safety belt usage was not reported and each of the other variables in the data set were the same, on average, for the people who had safety belt usage reported compared to those for whom safety belt usage was not reported. • In the case of MCAR, imputation is not needed, but missing data is rarely MCAR and there is no way to test if data are MCAR.
Missing at Random (MAR) • The “missingness” of data for variable Y is unrelated to the value of Y but may be related to other variables in the data. • For example in a crash data set safety belt usage would be MAR if probability of reporting safety belt usage is related to gender, but within each category of gender the probability of missing safety belt information is unrelated to the person’s safety belt usage. • In the case of MAR imputation can provide better estimates of variances, measures of central tendency, confidence intervals and standard deviations.
“Nonignorable” Missing Data • The “missingness” of data for variable Y is related to the value of Y. • For example in a crash data set, safety belt usage would be “nonignorable” if the probability of safety belt usage being reported was related to whether a safety belt was used. • Imputation is not appropriate in the case of nonignorable missing data.
Conditions for Multiple Imputation • Data must be “missing at random.” • The model used to generate the imputed values must be “correct.” • The analytic model must match up with the with the model used in the imputation.
Imputed Matches • Requires a good estimate of how many true matches are possible if the imputation model is to be “correct.” and the number of imputed matches is to be plausible. • A simple solution would be to simply look at how many records are in each data set.
The Ideal World Crash Records N= 360,000 EMS Records N = 68,500 Inpatient Records N = 17,000 Every EMS and inpatient record should link to a crash record
How Many Crash Records Should There Be? • Ideally, there should be one crash record for each person involved in a crash. • Crash reporting systems vary from state to state (some collect information on all persons involved, some collect information only on drivers, other collect information only on injured people, other collect information only for injury crashes)
Are There Duplicate Crash Records? • Duplicates often occur through data processing methods (e.g., updating of records, re-submission of data, multiple reports) • Duplicates are a relatively easy problem to deal with.
Are There Missing Crash Records? • Missing records can occur through data processing methods • Missing records can also occur due to failure to report or reporting thresholds. • Some people are injured in crashes that occur out-of-state (i.e., they may appear in inpatient hospital file but not in crash file)
Some Methods to Check for Missing Crash Records • Check number of records by date and submitting entity • Cross-reference data sets • Population-based rates • Historical trends • Check referential integrity of the data
Are There Duplicate EMS Records? • Duplicates can occur through data processing methods (e.g., updating of records, re-submission of data, multiple reports) • Duplicates may result from the way data is reported. • Duplicates may result from there really being more than a single event.
More Than a Single EMS Event • Often there are multiple providers involved in EMS services (e.g., ALS, BLS, Air and ground, inter-facility transport). • Sometimes records for the same event can be identified by a common incident number or by looking at response outcome or incident type. • CODES 2000 can be used to do a self-match of records.
Are There Missing EMS Records? • Missing records can occur through data processing methods • Missing records can also occur due to failure to report or reporting thresholds.
Some Methods to Check for Missing EMS Records • Check number of records by date and submitting entity • Cross-reference data sets • Population-based rates • Historical trends • Check referential integrity of the data
Are There Duplicate Inpatient Records? • Duplicates can occur through data processing methods (e.g., updating of records, re-submission of data, multiple reports) • Duplicates may result from the way data is reported. • Duplicates may result from there really being more than a single event.
More Than a Single Discharge • Patients may be discharged and re-admitted multiple times (e.g., complications, late effects, rehabilitation, surgery). • The “frequent flyer” phenomenon. • About 10% of motor vehicle crash victims have more than a single admission • Routines available in SAS, SPSS and Perl to array records into a single record
Are There Missing Inpatient Records? • Missing records can occur through data processing methods • Missing records can also occur due to failure to report. • Missing hospital records can also occur if the patient is hospitalized out of state
More Information on Multiple Imputation • Rubin & Little - Statistical Analysis with Missing Data • Shaffer JL – Analysis of Incomplete Multivariate Data • Rubin DB – Multiple Imputation After 18 Years, Journal of the American Statistical Association June 1996. pp 473-481. • NHTSA. Transitioning to Multiple Imputation – New Method to Impute Blood Alcohol Concentration in FARS • WWW.stat.psu.edu/~jls/mifaq/html