1 / 13

Link Plus Version 3 Update

July 2009 RPUG Meeting July 16, 2009. Link Plus Version 3 Update. Kathleen Thoburn ( kthoburn@cdc.gov ), CDC/NPCR Contractor David Gu ( dgu@cdc.gov ), CDC/NPCR Contractor Joe Rogers ( jrogers@cdc.gov ), CDC. Division of Cancer Prevention and Control

ion
Download Presentation

Link Plus Version 3 Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. July 2009 RPUG MeetingJuly 16, 2009 Link Plus Version 3 Update Kathleen Thoburn (kthoburn@cdc.gov), CDC/NPCR Contractor David Gu (dgu@cdc.gov), CDC/NPCR Contractor Joe Rogers (jrogers@cdc.gov), CDC Division of Cancer Prevention and Control National Center for Chronic Disease Prevention and Health Promotion Coordinating Center for Health Promotion Centers for Disease Control and Prevention Atlanta, Georgia

  2. Link Plus Software • Stand-alone probabilistic record linkage program • Combines ease of use and statistical sophistication • Detects duplicates within a data file, or links two data files together • Supports fixed width files, delimited files, and North American Association of Central Cancer Registries files • Provides powerful support for manual review of uncertain matches

  3. Link Plus Is Free $0.00

  4. Link Plus Linkage Overview Two main types of linkage: • External Linkage • Probabilistically link one file to another file • Deduplication • Special case of record linkage • Records in the same file are blocked, compared, and scored against each other • Result is a ranked list of record pairs • High-scoring pairs may be duplicates

  5. Link Plus V3 Enhancements • Removes size limitation on File 2 (4.5-4.8 million record limitation File 1) • Users can choose whether to write all potential matches or only matches with the highest score to the linkage report • Accepts various date formats for date comparison • Accepts quoted field values from delimited files • “Confirmation-like” method for address variables that contributes positively to linkage score with agreement but 0 weight with disagreement

  6. Link Plus V3 Enhancements • Provides SSN-like matching method for generic ID • Incorporates phonetic code into name matching methods • New name matching method that is more robust against outlier or misspelled names (more robust linkage score; eventually enable determination of cutoff value automatically for production mode) • Name matching methods for multiple names • Users can provide their own name frequency files for use by name matching methods

  7. Link Plus V3 Enhancements Manual Review • Can automatically assign non-match status to the current view based on previous non-match results • Option that allows users to assign match status by scores without overwriting the existing match status Export • Users can export the results of manual review to a NAACR format file • Users can save the settings and layouts of exporting

  8. Link Plus Linkage Configuration

  9. Advanced Linkage ConfigurationUser-specified Name Frequency File

  10. Link Plus Manual Review

  11. Link Plus Future Development • Allow CRS Plus users to select additional variables for manual review and export • Develop API; enable call from other software • Develop additional feature to enable use in production mode; including pre-analysis for selection of most effective cut-off • Write papers (including research on record linkage methods)

  12. CDC–NPCR Link Plus Contacts Kathleen K. Thoburn, CDC/NPCR Contractor E-mail: kthoburn@cdc.gov David Gu, CDC/NPCR Contractor E-mail: dgu@cdc.gov Tom Rawson, CDC Computer Programmer

  13. Thank you The findings and conclusions in this presentation are those of the author(s) and do not necessarily represent the views of the Centers for Disease Control and Prevention Kathleen Thoburn 518-966-5143 kthoburn@cdc.gov

More Related