1 / 11

LINK Plus Version 3.0: A Probabilistic Deduplication and Linkage Program

LINK Plus is a free and easy-to-use program developed by CDC to detect and link potential duplicates in cancer registry databases and other data files. It offers user-friendly and flexible options for probabilistic deduplication and linkage tasks, and can handle large datasets of up to 4.5 million records. This program has been extensively used by Cancer Registries and other users for over ten years.

Download Presentation

LINK Plus Version 3.0: A Probabilistic Deduplication and Linkage Program

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. LINK PLUS VERSION 3.0: A probabilistic deduplication and linkage program developed by CDC Division of Cancer Scott Van Heest December 6, 2014 Sierra Leone CDC Ebola Response National Center for Chronic Disease Prevention and Health Promotion Division of Cancer Prevention and Control

  2. Background District staff as well as CDC EPI team have performed many tasks to clean and de-duplicate data I did a visual review in excel of the national and district *.ecs files which indicated potential duplicates CDC Cancer Registries developers I work with have created a tool that can be run in two modes: detect duplicates in a cancer registry database or any other data file Link a cancer registry files, or any other data files Link Plus version 3.0 is free  and relatively easy to use

  3. Link Plus Capabilities Link Plus can run linkages on files with a maximum number of records of 4 to 4.5 million records Can be used with any type of data in fixed width or delimited format Here in an example using the 11-27 national case file:

  4. Link Plus Functions I imported the national National11.27.14.ecs file into VHF and exported as an excel file I cleaned up column headers that had with embedded commas and deleted rows with inconsistent data I started Link Plus and opened the excel file:

  5. Set up Link Plus data import

  6. Set up and Run Link Plus deduplication

  7. Select additional Fields for Review

  8. Link Plus Deduplication Output

  9. Link Plus also Works for Linking • Link Plus provides a great tool between linking data files • This should be used after deduplication • This is what we do to enrich our cancer registry data linking to many external data sources • This this will work great here once we get access to additional data sources (more complete holding and treatment center data files

  10. Conclusions Link Plus provides user-friendly, flexible options for meeting probabilistic deduplication and linking Used extensively by Cancer Registries and other users for over ten years There are help topics within Link Plus that will provide assistance for most commonly performed tasks Compares favorably with expensive commercial software programs Limited assistance for users outside of Cancer Registries I can provide link to FTP site or you can obtain by requesting through CDC website which will also provide a MSWord file to help users with new features in version 3.0

  11. Thank You! Scott Van Heest, sgv1@cdc.gov The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. National Center for Chronic Disease Prevention and Health Promotion Division of Cancer Prevention and Control

More Related