320 likes | 340 Views
Matching Data for EHDI Tracking Program. Cathy Gunderson EHDI/NEST Project Manager. Colorado Department of Public Health and Environment Denver, Colorado. Faculty Disclosure Information.
E N D
Matching Data for EHDI Tracking Program Cathy Gunderson EHDI/NEST Project Manager Colorado Department of Public Health and Environment Denver, Colorado
Faculty Disclosure Information In the past 12 months, I have not had a significant financial interest or other relationship with the manufacturer(s) of the product(s) or provider(s) of the service(s) that will be discussed in my presentation This presentation will (not) include discussion of pharmaceuticals or devices that have not been approved by the FDA or if you will be discussing unapproved or "off-label" uses of pharmaceuticals or devices.
De-Duplicating Person Data in a Centralized Database Cathy Gunderson, Project Manager
Overview • Overview of EHDI/NEST Project • Person De-duplication Process • SOUNDEX • Special considerations • Scoring
A few FLA’s (Four/Five Letter Acronyms) • CDPHE – Colorado Department of Public Health and Environment • EHDI – Early Hearing Detection and Intervention • NEST – Newborn Evaluation, Screening and Tracking • CHIRP – Clinical Health Information Records of Patients • CSHCN – Children with Special Health Care Needs – In Colorado: HCP – Title V • HCP – Health Care Program for Children with Special Needs (HCP) • CRCSN – Colorado Responds to Children with Special Needs – Birth Defects registry
Project Goals • Develop a comprehensive statewide EHDI program from screening to intervention • Implement a system that has a database that integrates information from NBH with PKU and Sickle Cell Disease • Create and maintain a centralized database which will help prove the efficacy of NBS • Implement the system
Colorado’s Newborn Screening • Colorado screens for 8 conditions • Hemoglobinopathies – Sickle Cell • Inherited Metabolic Diseases: • Phenylketonuria (PKU) • Galactosemia • Biotinidase Deficiency • Cystic Fibrosis • Endocrine Diseases: • Hypothyroidism • Congenital Adrenal Hyperplasia (CAH) • Newborn Hearing • Tandem Mass Spectrometry - 2006
First Step–Understand your Data • Electronic Birth Certificate (EBC) • Reported by clerks at birthing hospitals • Reported by clerks at CDPHE for non-birthing hospitals • Reporting is required within 10 days of birth • Laboratory Services Division (LSD) • State Laboratory that processes blood spot screening • Forms mailed to LSD, processed within 24 hours • Results reported within 3 days of receipt • Transactions from other agencies
Data from the Electronic Birth Certificate (Vital Records) • Unique identifier is Birth Certificate Number • Data are not 'cleaned' yet • May be duplicates if hospital sends information more than once • Fields exist for NBH screening results, already associated with the newborn • Newborn information for babies born out of state, born in transit or born at home as well as in birthing hospitals
Data from EBC (cont.) Daily: • EBC processed the night before Weekly: • Infant death records • Voided Records Annually: • Correction tape for resident county
Data from the Newborn Metabolic Screen (State Lab) Daily: • Unique identifier is accession number and form number • Data are final results from each screen • Demographic data on baby and mother • Information on hospital and doctor • Second screen may/may not have original form number (may have been lost) • Second screen may have new doctor
Transaction Data • Input from any CHIRP or CHIRP-like application • Standard Format • Based on different type of event, i.e., birth, Dx, communication, status change • Data sent out from NEST in same format
Second Step – Process the Data Daily: • Validate the data • Validate a unique identifier in input • Must be the same person • Un-duplicate - SOUNDEX routine • Assign unique identifier: NEST_PID • Retain/record original EBC data • Retain/record original lab data • Retain/record original transaction data • Record all screening results (activities)
De-Duplication Routine • If a potential unique number is received: • Verify that it is the same person • If not, it is an exception • Unique Numbers: • SSN – most babies don’t have one yet • EBC • Blood spot form number • Accession number combined with date
SOUNDEX: Find Potential Matches • Find best selections for matching based on SOUNDEX keys • Base on the type of data you receive • Some data better than from other sources • Reliability of data coming in • EBC considered ‘most right’
Build a SOUNDEX Key for Input • Treat as all lower case • If first two letters are ei or ai change to i • Change all c to k • Change all ch to k • Change all ph to f • Change all z to s • Change all y to i
Remove all duplicated letters • Remove all special characters (‘.-. And spaces) • Keep first letter of each name part • Remove all vowels after 1st letter • Use first 4 remaining letters for last name • Use first 3 remaining letters for first name • Use middle initial • Put DOB in CCYYMMDD order
Special Considerations • Hispanic Surnames • Can be a composite: • Father’s last name • Mother’s last name • A composite name will be treated as 3 last names • Marital Status or Insurance restrictions • LAB under Mother’s last name at birth • Unmarried mom • Married mom but insurance still under maiden name • EBC under Father’s last name
Added Considerations • SOUNDEX might select a candidate, but no score for a match on actual data • Lopes Lopez • Gonzalez Gonzales • Gomez Gomes • We allow for points on a SOUNDEX match
SOUNDEX Key Types • 5 Key Types: • A) LastName FirstName MiddleInit Gender DOB (YYMM) • B) LastName Gender DOB (YYMMDD) TOB (HHMM) • C) LastName LastName FirstName DOB (YYMM) • D) LastName SOUNDEX • E) FirstName SOUNDEX
SOUNDEX Keys • UP to 24 Different Values for the 5 Types • Example: LastName • Child’s Last Name • Child’s AKA Last Name • Child’s Last Name Part 1 • Child’s Last Name Part 2 • Mother’s Last Name • Mother’s Last Name Part 1 • Mother’s Last Name Part 2 • Father’s Last Name • Father’s Last Name Part 1 • Father’s Last Name Part 2
Average Number of Keys • No child has all 24 keys • If Child and Mom and Dad all have same last name, Key is only created once – no duplicate SOUNDEX Keys for a child • Missing Data • On average, each of our children have 12 keys.
Scoring Routine • After a potential match is found, individual fields are compared and points awarded for matches • Actual Data Fields are compared : • Last name, first name, middle name, DOB, TOB • Mother’s last name, first name, maiden name, DOB • AKA names • Father’s last name, first name, DOB • Any unique identifiers recorded with the input and on the database (i.e., Birth Cert #, NBS form #, etc.)
Scoring Routine (cont.) • A score above a certain threshold indicates the same person – same NEST_PID assigned • A score under a certain threshold indicates a different person – new NEST_PID is generated • A score between those thresholds cannot be determined by the application and will need human intervention to determine
Fine Tuning the De-duplication Routine • Make adjustments to the thresholds • Too many duplicates being added • Make adjustments to the points awarded for matches • Twins! Use birth type and order • ? Take away points for no matches ? • What if MI present on one and not on another – some points? • Constant vigilance!
Human Intervention • Can help fine tune the De-duplication Routine • Three options: • Override: • Add as a new person • Indicate a match and update information • Resubmit • Thread of processing / timing / Twins!
Colorado Contact Cathy Gunderson, EHDI/NEST Project Manager Colorado Department of Public Health and Environment FCHSD-HCP-A4 4300 Cherry Creek Drive South Denver, CO 80246-1530 cathy.gunderson@state.co.us 303-692-2145