130 likes | 505 Views
July 2009 RPUG Meeting July 16, 2009. Link Plus Version 3 Update. Kathleen Thoburn ( kthoburn@cdc.gov ), CDC/NPCR Contractor David Gu ( dgu@cdc.gov ), CDC/NPCR Contractor Joe Rogers ( jrogers@cdc.gov ), CDC. Division of Cancer Prevention and Control
E N D
July 2009 RPUG MeetingJuly 16, 2009 Link Plus Version 3 Update Kathleen Thoburn (kthoburn@cdc.gov), CDC/NPCR Contractor David Gu (dgu@cdc.gov), CDC/NPCR Contractor Joe Rogers (jrogers@cdc.gov), CDC Division of Cancer Prevention and Control National Center for Chronic Disease Prevention and Health Promotion Coordinating Center for Health Promotion Centers for Disease Control and Prevention Atlanta, Georgia
Link Plus Software • Stand-alone probabilistic record linkage program • Combines ease of use and statistical sophistication • Detects duplicates within a data file, or links two data files together • Supports fixed width files, delimited files, and North American Association of Central Cancer Registries files • Provides powerful support for manual review of uncertain matches
Link Plus Is Free $0.00
Link Plus Linkage Overview Two main types of linkage: • External Linkage • Probabilistically link one file to another file • Deduplication • Special case of record linkage • Records in the same file are blocked, compared, and scored against each other • Result is a ranked list of record pairs • High-scoring pairs may be duplicates
Link Plus V3 Enhancements • Removes size limitation on File 2 (4.5-4.8 million record limitation File 1) • Users can choose whether to write all potential matches or only matches with the highest score to the linkage report • Accepts various date formats for date comparison • Accepts quoted field values from delimited files • “Confirmation-like” method for address variables that contributes positively to linkage score with agreement but 0 weight with disagreement
Link Plus V3 Enhancements • Provides SSN-like matching method for generic ID • Incorporates phonetic code into name matching methods • New name matching method that is more robust against outlier or misspelled names (more robust linkage score; eventually enable determination of cutoff value automatically for production mode) • Name matching methods for multiple names • Users can provide their own name frequency files for use by name matching methods
Link Plus V3 Enhancements Manual Review • Can automatically assign non-match status to the current view based on previous non-match results • Option that allows users to assign match status by scores without overwriting the existing match status Export • Users can export the results of manual review to a NAACR format file • Users can save the settings and layouts of exporting
Advanced Linkage ConfigurationUser-specified Name Frequency File
Link Plus Future Development • Allow CRS Plus users to select additional variables for manual review and export • Develop API; enable call from other software • Develop additional feature to enable use in production mode; including pre-analysis for selection of most effective cut-off • Write papers (including research on record linkage methods)
CDC–NPCR Link Plus Contacts Kathleen K. Thoburn, CDC/NPCR Contractor E-mail: kthoburn@cdc.gov David Gu, CDC/NPCR Contractor E-mail: dgu@cdc.gov Tom Rawson, CDC Computer Programmer
Thank you The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the Centers for Disease Control and Prevention Kathleen Thoburn 518-966-5143 kthoburn@cdc.gov