150 likes | 161 Views
This presentation discusses the challenges and successes of linking state data files, specifically in terms of traffic records. The goal is to make high-quality exposure data accessible to DOT, state agencies, and the public, and to develop a standardized state data repository accessible on the web. The presentation covers the development of standardized data structures, data transformation challenges, and the process of linking state files. Conclusions highlight the importance of a standardized repository for exposure data at the county and city level.
E N D
Linking State Data Files:Challenges and Successes Demetra Collia, M.S., M.H.S. 30th International Traffic Record Forum Nashville, Tennessee U.S. Department of Transportation Bureau of Transportation Statistics
Purpose • Make high quality exposure data available to the U.S. DOT, state agencies, and the public • Evaluate the feasibility and desirability of developing a standardized state data repository accessible on the web
Goal • Transform files into a common format • Facilitate analysis at the state level, and comparisons across states • Make state data accessible to all with minimum effort
Key Factors • Technology is available • Process is manageable (administratively, financially) • Stakeholder/state interest to participate
Scope 1. Develop standardized files of traffic records: Crash Vehicle registration Driver licensing/history • Combine data across states • Add other state data
Pilot States • Alabama, Alaska, Arizona, Connecticut, Iowa, Kentucky, Louisiana, Ohio, Wisconsin, West Virginia
Phase I: Develop Data Structures • Crash files (person, vehicle, event) • Vehicle registration • Driver – licensing, history
Phase II: Data Transformation Easier for some data fields than others. Gender, Age Race, fuel type
Phase II: Challenges Data not collected Data collected at a more aggregate level than the standardized structure requires Lack of internal QC data checks
Phase III: Linking State Files Deterministic Linking vs. Probabilistic Linking
Matching Data Fields SSN Name Date of Birth Driver License Number Address License Plate Number VIN
Linking Files: Results Crash - Driver licensing/history files driver license number: 93.6% additional fields: 98% For current year data: driver license number is a better matching field than SSN
Linking Files: Results Crash - Vehicle registration files plate number: 62.3% For current year data: plate number is a better matching field than owner SSN
Conclusions Linking state files can be done. Linking crash data to vehicle registration data still a challenge A standardized repository of vehicle registration files is a useful source of exposure data at the county and city level.
Questions ? Demetra Collia Bureau of Transportation Statistics 202-366-1610 demetra.collia@bts.gov 21