Health Care Data Analytics

Health Care Data Analytics Patient Identification Lecture a This material (Comp 24 Unit 5) was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number 90WT0001. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.

Patient IdentificationLearning Objectives - 1 • Define the key attributes of patient identifiers (Lecture a) • Describe the challenges of duplicate and overlaid records (Lecture a) • Discuss the pros and cons of standard identifiers vs. linking records (Lecture a)

Patient IdentificationLearning Objectives - 2 • Describe the methods used for patient record-matching (Lecture a) • Match a sample set of patient records (Lecture a) • Discuss the benefits and limitations of de-identified data (Lecture b)

Patient Identification • Patient identifiers • Duplicate and overlaid records • Standard identifiers • Record linkage methods • De-identification of data

Patient Identifiers - 1 • Benefits • Easy linkage of records • Facilitate health information exchange • Reduce errors and costs arising from duplicate and overlaid records

Patient Identifiers - 2 • Benefits • Easy linkage of records • Facilitate health information exchange • Reduce errors and costs arising from duplicate and overlaid records • Risks • Easy linkage of records • Potentially compromise privacy and confidentiality

Duplicate and Overlaid Records • Duplicate records – when a single individual has more than one identifier • Overlaid records – when more than one individual share the same identifier

Problems with Duplicate and Overlaid Records - 1 • Identifier errors compromise quality of care and can be costly • $4,500 to correct duplicate patient records in operating room • 325 minutes of work to correct duplicate records in hospital • Cost increases with length of time error not identified • Duplicate records likely to be associated with missed abnormal test results

Problems with Duplicate and Overlaid Records - 2 • High potential for overlaid records • Study of five large academic centers found • Occurrence of matching first and last name was 16.5-40.7%, reduced to 0.2-15.5% when date of birth added • Highly variable policies for preventing, detecting, and removing duplicate records, and for mitigating errors

Key Attributes for Patient Identifiers • Unique: Only one person has a particular identifier • Non-disclosing: Discloses no personal information • Permanent: Will never be re-used • Ubiquitous: Everyone has one • Canonical: Each person has only one • Invariable: Will not change over time

How are Patient Identifiers Assigned? • Usually at organization or enterprise level though a Master Patient Index (MPI) • Numbering systems may be assigned • Serial: Usually from a numbering system • Derived: Identifier derived from one or more personal trains of an individual • Composite: Combination of above • Check digit to improve accuracy in data entry, transmission, and retrieval

One Solution: Government-Issued Patient Identifiers • New Zealand National Health Index • Iceland Health Sector Database • Created national genetic database • Singapore • National Registration Identity Card (NRIC) to all citizens • Foreign Identification Number (FIN) to all long-term visitors • Western European countries use them

Government-Issued Patient Identifiers in the U.S.? • Originally HIPAA mandated creation of patient identifiers • Use SSN as a national health identifier? • Technical problems • Other problems • Some advocate voluntary identifiers • Those agreeing to them would sign consent form outlining benefits and risks

Unnecessary and Politically Infeasible? • Against: • “Not worth the fight” • Politically impossible to deploy in U.S. • Other ways to achieve goals for national identifiers • Expenses up front; benefits accrue later • For: • Unique patient identifier would reduce errors • Improve system interoperability in U.S. • Costs would be substantial ($3.9-9.2 billion) • Would not significantly increase risk for security breaches over other options

Alternative: Algorithmic Matching - 1 • Attributes to link patient records • Name • Address • Date of birth • Phone • Approaches include • Deterministic • Fuzzy • Probabilistic

Alternative: Algorithmic Matching - 2 • Results in • Match • Possible match • Non-match • Use case prioritizes trade-off between false-positives and false-negatives

Algorithmic Matching Steps • Preparing or cleaning data • Detecting errors and deviations (field comparators) • Separating likely matches from unlikely matches (blocking) • Configuring matching algorithms to classify record pairs as reflecting the same individual or entity

Data Quality Issues - 1 • Data errors take form of • Phonetic errors: Word or name has multiple spoken representations, e.g., Hersh, Hersch, Hirsch • Typographical errors: Word or name recorded incorrectly as result of omitted, inserted, or transposed characters, e.g., her, ehr, hr • Morphological confusion: Characters appearing similar, e.g., 0/O, l/I

Data Quality Issues - 2 • Personal traits may change over time • Name: Usually through marriage or divorce • Address: 11-15% change in an average year • Cultural: Use of multiple family names as well as order, particles, etc. • Different date formats • Use of month names • Date order • 2-digit or 4-digit year

Field Comparison • Records linked by comparison of fields • Deterministic matching: Rules for exact or close matching of one or more fields • Fuzzy methods: Some range of disagreement among fields • Birth within a certain period of time • Common variations in names • Probabilistic methods: String comparators for one or more fields, with measure of similarity with cut-off threshold

Approaches to Probabilistic Matching • Many methods show relatively high level of accuracy • “Best” methods vary by most desired attribute • Highest sensitivity: Jaro-Winkler comparator • Best area under sensitivity-specificity curve: Longest common substring and root mean square of multiple scores • Research still required for problems with non-standardized (“dirty”) data and missing data

Current State of Patient Record-Matching • Imperative for patient safety, care coordination, and data quality • Benefit from standardizing patient-identifying attributes in record: • First/given, middle/second given, last/family, suffix • Date of birth • Current and historical addresses • Phone number – all known • Gender – from HL7 value set: M, F, UN • Need process for handling changes across health care system

Patient IdentificationSummary – Lecture a • Patient identifiers, including national identifiers, have benefits and risks • Methods for linking patient records algorithmically work well but can be challenged by data standard, data quality, and inconclusive matches

Patient IdentificationReferences – 1 – Lecture a References Acquisti, A and Gross, R (2009). Predicting Social Security numbers from public data. Proceedings of the National Academy of Sciences. 106: 10975-10980. Anonymous (2005). Linking Health Care Information: Proposed Methods for Improving Care and Protecting Privacy. Washington, DC, Markle Foundation Aranow, M (2013). It’s Time for a National Patient Identifier. Life as a Healthcare CIO, February 26, 2013. http://geekdoctor.blogspot.com/2013/02/its-time-for-national-patient-identifier.html Arnason, E. (2002). Personal Identifiability in the Icelandic Health Sector Database . Journal of Information Law & Technology. Retrieved January 11, 2017, from https://www2.warwick.ac.uk/fac/soc/law/elj/jilt/2002_2/arnason/. Detmer, DE (2010). Activating a full architectural model: improving health through robust population health records. Journal of the American Medical Informatics Association. 17: 367-369.

Patient IdentificationReferences – 2 – Lecture a References Fernandes, L., Lenson, C., Hewitt, J., Weber, J., & Yamamoto, J. (n.d.). Medical Record Number Errors: A Cost of Doing Business? Retrieved January 11, 2017, from http://sce2.umkc.edu/csee/leeyu/Mahi/medical-data2.pdf Zz_Ferris, N. (2005, July 11). Hope for patient ID dwindles. Retrieved January 12, 2017, from http://www.healthcareitnews.com/news/hope-patient-id-dwindles Zz_Ferris, N (2005, July 12). Why a national health care ID isn’t worth the fight. Retrieved January 12, 2017, from http://www.healthcareitnews.com/news/why-national-health-care-id-isnt-worth-fight Grannis, SJ, Overhage, JM, et al. (2003). Analysis of a probabilistic record linkage technique without human review. Proceedings of the 2003 AMIA Annual Symposium, Washington, DC. Hanley & Belfus. 259-263. Grannis, SJ, Overhage, JM, et al. (2004). Real world performance of approximate string comparators for use in patient matching. MEDINFO 2004 - Proceedings of the Eleventh World Congress on Medical Informatics, San Francisco, CA. IOS Press. 43-47.

Patient IdentificationReferences – 3 – Lecture a References Gulcher, JR and Stefansson, K (2000). The Icelandic healthcare database and informed consent. New England Journal of Medicine. 342: 1827-1830. Hieb, B. (2006). The Case for a Voluntary National Healthcare Identifier. ASTM International,3(2). doi:10.1520/JAI13891 Hieb, BR (2008). Designing a Voluntary Universal Healthcare Identification System. Stamford, CT, Gartner Hillestad, R., Bigelow, J. H., Chaudhry, B., Dreyer, P., Greenberg, M. D., Meili, R. C., . . . Taylor, R. (2008). Identity Crisis: An Examination of the Costs and Benefits of a Unique Patient Identifier for the U.S. Health Care System. Retrieved January 12, 2017, from http://www.rand.org/pubs/monographs/MG753.html Joffe, E, Bearden, CF, et al. (2009). Duplicate patient records--implication for missed laboratory results. AMIA Annual Symposium Proceedings 2009, San Francisco, CA. 1269-1275. Li, X and Shen, C (2013). Linkage of patients records from disparate sources. Statistical Methods in Medical Research. 22: 31-38.

Patient IdentificationReferences – 4 – Lecture a References McCoy, AB, Wright, A, et al. (2013). Matching identifiers in electronic health records: implications for duplicate records and patient safety. Quality and Safety in Health Care. 22: 219-224. Morris, G, Farnum, G, et al. (2014). Patient Identification and Matching - Final Report. Baltimore, MD, Audacious Inquiry. http://www.healthit.gov/sites/default/files/patient_identification_matching_final_report.pdf Ong, TC, Mannino, MV, et al. (2014). Improving record linkage performance in the presence of missing linkage data. Journal of Biomedical Informatics: Epub ahead of print. New Zeland National Health Index, http://www.health.govt.nz/our-work/health-identity/national-health-index/ Randall, S. M., Ferrante, A. M., Boyd, J. H., & Semmens, J. B. (2013). The Effect of Data Cleaning on Record Linkage Quality. BMC Medical Informatics and Decision Making 2013. doi:10.1186/1472-6947-13-64

Patient IdentificationReferences – 5 – Lecture a References Sayers, A, Ben-Shlomo, Y, et al. (2016). Probabilistic record linkage. International Journal of Epidemiology: Epub ahead pf print. Tromp, M, Ravelli, AC, et al. (2011). Results from simulated data sets: probabilistic record linkage outperforms deterministic record linkage. Journal of Clinical Epidemiology. 64: 565-572. Tromp, M, Ravelli, AC, et al. (2008). An efficient validation method of probabilistic record linkage including readmissions and twins. Methods of Information in Medicine. 47: 356-363. Winkler, WE (2009). Should Social Security numbers be replaced by modern, more secure identifiers? Proceedings of the National Academy of Sciences. 106: 10877-10878.

Health Care Data AnalyticsPatient IdentificationLecture a This material was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number 90WT0001.

Health Care Data Analytics