290 likes | 300 Views
We are developing a Natural Language Processing system that extracts adverse drug effects from medical texts, specifically discharge summaries. Our system aims to bridge the gap between clinical trials and actual drug use, ensuring patient safety.
E N D
Extraction of Adverse Drug Effects from Clinical Records Our material is Discharge Summary E. ARAMAKI* Ph.D., Y. MIURA **, M. TONOIKE ** Ph.D., T. OHKUMA ** Ph.D., H. MASHUICHI ** Ph.D., K.WAKI * Ph.D. M.D., K.OHE * Ph.D. M.D., * University of Tokyo, Japan ** Fuji Xerox, Japan
Background • The use of Electronic Health Records (EHR) in hospitals is increasing rapidly everywhere • They contain much clinical information about a patient’s health BUT Many Natural Language texts ! Extracting clinical information from the reports is difficult because they are written in natural language
NLP based Adverse Effect Detecting System • We are developing a NLP system that extracts medical information, especially Adverse Effect, form natural language parts • INPUT • a medical text (discharge summary) • OUTPUT • Date Time • Medication Event • Adverse Effect Event ≒ i2b2 Medication Challenge But our target focuses only on adverse effect Adverse Effect Relation (AER)
Why Adverse Effect Relations? • Clinical trials usually target only a single drug. • BUT: real patients sometimes take multiple medications, leading to a gap separating the clinical trials and the actual use of drugs • For ensuring patient safety, it is extremely important to capturing a new/unknown AEs in the early stage.
DEMO is available on http://mednlp.jp
副作用関係の推定 System Demo
Cc 副作用関係の推定 System Demo Medication has no complications at the time of diagnosis 6/23-25 FOLFOX6 2nd. 6/24, 25: moderate fever (38℃) again. a fever reducer…. Relation Adverse Effect
The point of This Study • (1) Preliminary Investigation: How much information actually exist? • We annotated adverse effect information in discharge summaries • (2) NLP Challenge: Could the current NLP retrieve them? • We investigated the accuracy of with which the current technique could extract adverse effect information
Outline • Introduction • Preliminary Investigation • How much information actually exist in discharge summary? • NLP Challenge • Conclusions
Material & Method • Material: 3,012 Japanese Discharge Summaries • 3 humans annotated possible adverse effects due to the following 2 steps Step1 Event Annotation <D>Lasix<D> for <S>hypertension</S> is stopped due to <S>his headache</S>. XML tag = Event Step 2 Relation Annotation <D rel=“1”>Lasix<D> for <S>hypertension</S> is stopped due to <S rel=“1”>his headache</S>. XML attribute = Relation
Annotation Policy & Process • We regard only MedDRA/J terms as the events. • We regarded even a suspicion of an adverse effect as positive data. adverse effect terminology • Entire data annotation is time-consuming → We split data into 2 sets SET-A (Event Rich parts): contains keywords such as Stop, Change, Adverse effect, Side effect Full annotated SET-B: The other Randomly sampled & annotated
SET-B SET-A 14.5%×53.5%+85.5%×11.3% = 17.4%
Results of Preliminary Investigation • About 17% discharge summaries contain adverse effect information. • Even considering that the result includes just a suspicion of effects, the summaries are a valuable resource on AE information. • We can say that discharge summaries are suitable resources for our purpose.
Outline • Introduction • Preliminary Investigation • NLP Challenge • Could the current NLP technique retrieve the AEs? • Conclusions
Combination of 2 NLP Steps • 2 NLP steps directly correspond to each annotation step Lasix for hyperpiesia is stopped due to the pain in the head. Event Annotation Medication symptom symptom Relation Annotation Adverse Effect Relation ≒Named Entity Recognition Task = Relation Extraction Task, which is one of the most hot NLP research topics.
Step1: Event Identification • Machine Learning Method • CRF (Conditional Random Field) based Named Entity Recognition • Feature • Lexicon (Stemming), POS, Dictionary based feature (MedDRA), window size=5 • Material • SET-A Corpus with Event Annotations state-of-the-art method at i2b2 de-identification task Standard Feature Set
Step1: Result of Event Identification • Result Summary Cat. of Event Precision Recall F-measure Medication Event 86.99 81.34 0.84 AE Event 85.56 80.24 0.82 • All accuracies (P, R) >> 80 %, F>0.80, demonstratingthe feasibility of our approach • Considering that the corpus size is small (435 summaries), we can say that the event detection is an easy task
Step2: Relation Extraction Method • Basic Approach ≒Protein-Protein Interaction (PPI) task [BioNLP2009-shared Task] • Example • For each m (Medications) • For each a (Adverse Effects) • judge_it_has_rel (a, m) Lasix for hypertension is stopped due to his headache • (1) judge_it_has_AER (Lasix , hypetension) • (2) judge_it_has_AER (Lasix ,headach)
Two judgment methods • (1)PTN-BASED: heuristic rules using a set-of-keyword & word distance ..is on ACTOS but stopped for relief of the edema . <medication> keyword <adverse effect> n=1 n=4 Judge_it_has_AER (m, a, keyword=stopped, windowsize5) • (2) SVM-BASED: Machine learning approach • Feature: distance & words between two events ( medication & adverse effect) See proceedings for detailed
Step2: Result of Relation Extraction Precision Recall F-measure PTN-BASED 41.1% 91.7% 0.650 SVM-BASED 57.6% 62.3% 0.598 • Both PTN & SVM accuracies are low (F<0.65) → the Relation extraction task is difficult! • SVM accuracy is significant (p=0.05) lower than PTN • (1) Corpus size is small (2) positive data << negative data Machine learning suffers from such small imbalanced data
Outline • Introduction • Preliminary Investigation • NLP Challenge • Discussions • (1) Overall Accuracy • (2) Controllable Performance • (3) Event Distribution • Conclusions
Discussion (1/3) Overall Accuracy • The overall accuracy is estimated by the combined accuracies of step1 & step2 Overall (= step1 × step2) Precision 0.289 (=0.855 × 0.869 × 0.390) Recall 0.597 (=0.802 × 0.813 × 0.917) • Each NLP step is not perfect, so, the combination of such imperfect results leads to the low accuracy (especially many false positives; low precision)
Discussion (2/3)Performance is Controllable • The performancebalance between recall & precision could be controlled High precision setting High recall setting That is a strong advantage of NLP Precision & Recall curve in SVM
Discussion (3/3)Event Distribution • We investigated the entire AE frequency for each medication category. AE freq. distribution of Drug #1 distribution acquired from annotated real data distribution acquired from our system results
Discussion (3/3)AER Distribution • Then, we checked the goodness of the fit test, which measures the similarity between two distributions P-value Med. 1 0.023 Med. 2 0.013 Med. 3 0.010 Med. 4 0.006 Med. 5 0.005 Total 0.011 • High p-value (p=0.011> 0.01) indicates • two distributions are similar.
Outline • Introduction • Preliminary Investigation • NLP Challenge • Discussions • Conclusions
Conclusions (1/2) • Preliminary Investigation: • About 17% discharge summaries contain adverse effect information. • We can say that discharge summary are suitable resources for AERs • NLP Challenge: • Could NLP retrieve the AE information? • Difficult! Overall accuracy is low
Conclusions (2/2) • BUT: 2 positive findings: (1) We can control the performance balance (2) Even the accuracy is low, the aggregation of the results is similar to the real distribution • IN THE FUTURE: • A practical system using the above advantages • More acute method for relation extraction
Thank you Contact Info • Eiji ARAMAKI Ph.D. • University of Tokyo • eiji.aramaki@gmail.com • http://mednlp.jp