1 / 28

Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?

Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?. Hercules Dalianis Clinical Text Mining Group Department of Computer and Systems Sciences (DSV) hercules@dsv.su.se. Background. Starting 2007 Karolinska University Hospital, Stockholm

grace
Download Presentation

Panel: Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Panel:Automatic Clinical Text De-Identification: Is It Worth It, and Could It Work for Me? Hercules Dalianis Clinical Text Mining GroupDepartment of Computer and Systems Sciences (DSV) hercules@dsv.su.se

  2. Background Starting 2007 Karolinska University Hospital, Stockholm Greater Stockholm (City Council) 2 million inhabitants 1800 beds/inpatients 550 clinical units Hercules Dalianis, MEDINFO 2013

  3. TakeCare EPR system Swedish electronic patient record system, now owned by CompuGroup Medical Centralized, text file based Built on APL programming language Data transferred to MySQL database to make it manageable (Intelligence) Hercules Dalianis, MEDINFO 2013

  4. Ethical permission What type of research will be carried out How will it be carried out No social security number No personal names Safe guard of data Hercules Dalianis, MEDINFO 2013

  5. Encryption and safe guard • Encrypted server • Password protected • Locked into an alarmed room • Server locked to a rack • No Internet connection • Few people have access to this server (that have to sign security paper) => Probably safer than at the hospital Hercules Dalianis, MEDINFO 2013

  6. Trust, Trust and more Trust • Good contacts with hospital management • They decide for the whole hospital/all clinical units • No psychiatric or veneric diseases, no paperless refugees Hercules Dalianis, MEDINFO 2013

  7. Stockholm EPR Corpus • We obtained 1 million patient records from 550 clinical units from the year 2006-2010 • In several extracts that also continue • Each patient have an unique social security number, from birth to dead Replaced by a serial number • All patient names removed • The rest including sensitive text is present Hercules Dalianis, MEDINFO 2013

  8. DEID work • Yes, we did it also to obtain an overview of what problems may occur • We followed HIPAA*) but adapted it for Swedish conditions *) Health Insurance Portability and Accountability Act Hercules Dalianis, MEDINFO 2013

  9. The Stockholm EPR PHI*) corpus • 100 electronic patient records (EPRs) in Swedish • Five clinics: Neurology, Orthopaedia, Infection, Dental Surgery and Nutrition • 20 patients from each clinic, 50% men, 50% women • 380 000 tokens • Three annotators annotated the whole corpus *) Protected Health Information Hercules Dalianis

  10. 28 PHI-classes Account_Number, Age, Age_Over_89, Biometric_Identifier, Date_Part, Full_Date, Year, First_Name, Last_Name, Patient_First_Name, Patient_Last_Name, Relative_First_Name, Relative_Last_Name, Clinician_First_Name, Clinician_Last_Name, Location, Country, Municipality, Organization, Street_Address, Town, Health_Care_Unit, Device_Identifier_and_Serial_Number, Ethnicity, Fax_Number, Phone_Number, Relation, Uncertain Hercules Dalianis

  11. Hercules Dalianis

  12. Consensus eight annotation classes Age Date_Part Full_Date First_Name Last_Name, Health_Care_Unit Location Phone_Number Hercules Dalianis

  13. Annotation classes and instances • Age 56 • Full date 710 • Date part 500 • First name 923 • Last name 928 • Location 1 021 • Health care unit 148 • Phone number 135 Sum: 4 421 Hercules Dalianis

  14. 380 000 tokens 4 421 sensitive instances ~ 1 percent sensitive information Hercules Dalianis

  15. Eight annotation classes training and test using Stanford NER-CRF Hercules Dalianis

  16. Conditional Random fields à la Stanford NER • 0.95-0.74 precision, • 0.83-0.36 recall • 0.90-0.49 F-score • The 8 annotation classes and the words • The rest is Black box • Window breadth • Distance between words etc Hercules Dalianis

  17. Research on Stockholm EPR Corpus DEID and Resynthesis Factuality level detection of diagnoses Negation detection Detecting the amount of hospital-acquired infections (HAI) Detection of adverse drug events Comorbidities Hercules Dalianis, MEDINFO 2013

  18. Conclusion Preferably to work on original data Too costly and difficult to de-identify data Not safe enough De-identification makes the data too noisy. Hercules Dalianis, MEDINFO 2013

  19. References Velupillai, S., H. Dalianis, M. Hassel and G. H. Nilsson. 2009. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. International Journal of Medical Informatics (2009), doi:10.1016/j.ijmedinf.2009.04.005 Dalianis, H. and S. Velupillai. 2010. De-identifying Swedish Clinical Text - Refinement of a Gold Standard and Experiments with Conditional Random Fields, Journal of Biomedical Semantics 2010, 1:6 (12 April 2010) Hercules Dalianis, MEDINFO 2013

  20. Alfalahi, A., S. Brissman and H. Dalianis. 2012. Pseudonymisation of person names and other PHIs in an annotated clinical Swedish corpus. In the Proceedings of the Third Workshop on Building and Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) held in conjunction with LREC 2012, May 26, Istanbul, pp 49-54 Hercules Dalianis, MEDINFO 2013

  21. Comorbidities in Comorbidity-view • Which ICD-10 codes co-occurwithwhichotherones Hercules Dalianis

  22. Comorbidity View Hercules Dalianis

  23. Hercules Dalianis

  24. Hercules Dalianis

  25. Example record(Anonymized manually) 123 H - IVA 322916614D 2007-08-21 9:12 1944 Kvinna Anamnesis Kvinna med hjrtsvikt, förmaksflimmer, angina pectoris. Ensamstående änka. Tidigare CVL med sequelae högersidig hemipares och afasi. Tidigare vårdad för krampanfall misstänkt apoplektisk. Inkommer nu efter att ha blivit hittad på en stol och sannolikt suttit så över natten. Inkommer nu för utredning. Sonen Johan är med. Hercules Dalianis

  26. 23 H - IVA 322916614D 2008-08-21 10:54 1944 Kvinna Bedömning Grav hjärtsvikt efter hjärtinfarkt x 2 inklusive eoisod med asystoli och HLR. EF 20-25%. Neurologisk påverkan med hösidig svaghet. Blodprov. Odlingar tas i blod och urin. Remiss skickas pulm-rtg enl dr Svenssons anteckning. Atelektaser. Pneumoni, I110. Hjärtinsufficiens, ospecificerad, I509 Hercules Dalianis

  27. (English translation) 123 H - IVA 322916614D 2008-08-21 9:12 1944 Woman Anamnesis Woman with hert failures, atrial fibrillation, and angina pectoris. Single widow. Former CVL with sequele, rght hemiparesis and aphasia. Prior hospital care for seizures, suspected to be apoepeleptic. Arrive to hospital after being found in a chair and probably been sitting there over night. Arrive for further investigation and care. Accompanied by her son Johan. Hercules Dalianis

  28. 123 H - IVA 322916614D 2008-08-21 10:54 1944 Woman Assessment/Plan Severe heart failure after heart infarction x 2. including episode with heart arrest and acute heart arrest treatment. Ejection fracture (EF) 20-25%. Neurological symptoms with right sided hemiparesis. Blood samples. Culture for blood and urine. Referral for pulmonary x-ray according to dr Svensson’s notes. Atelectases. Pneumonia, I110. Heart failure, unspecified, I509. Hercules Dalianis

More Related