1 / 31

Determining the Syntactic Structure of Medical Terms in Clinical Notes

Determining the Syntactic Structure of Medical Terms in Clinical Notes. Bridget T. McInnes ¹ Ted Pedersen ² and Serguei V. Pakhomov ¹ University of Minnesota ¹ University of Minnesota Duluth². Syntactic Structure of Terms. Monolithic. Non-branching. Left-branching. Right-branching.

izzy
Download Presentation

Determining the Syntactic Structure of Medical Terms in Clinical Notes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Determining the Syntactic Structure of Medical Terms in Clinical Notes Bridget T. McInnes¹ Ted Pedersen² and Serguei V. Pakhomov¹ University of Minnesota¹ University of Minnesota Duluth²

  2. Syntactic Structure of Terms Monolithic Non-branching Left-branching Right-branching w1 w2 w3 w1 w2 w3 w1 w2w3 w1w2 w3 black = independence green = dependence

  3. Syntactic Structure of Terms Monolithic Non-branching Left-branching Right-branching w1 w2 w3 w1 w2 w3 w1 w2w3 w1w2 w3 difficulty finding words black = independence green = dependence

  4. Syntactic Structure of Terms Monolithic Non-branching Left-branching Right-branching w1 w2 w3 w1 w2 w3 w1 w2w3 w1w2 w3 difficulty finding words serum dioxin level black = independence green = dependence

  5. Syntactic Structure of Terms Monolithic Non-branching Left-branching Right-branching w1 w2 w3 w1 w2 w3 w1 w2w3 w1w2 w3 difficulty finding words serum dioxin level urinary tractinfection black = independence green = dependence

  6. Syntactic Structure of Terms Monolithic Non-branching Left-branching Right-branching w1 w2 w3 w1 w2 w3 w1 w2w3 w1w2 w3 difficulty finding words serum dioxin level low back pain urinary tractinfection black = independence green = dependence

  7. Goal Simple but effective approach to identify the syntactic structure of three-word medical terms

  8. Motivation • Potentially improve the analysis of unrestricted medical text • Unsupervised syntactic parsing • Mapping of medical terms to standardized terminologies

  9. Related Work • Previously • Resnik, 1993 • Resnik and Hirst, 1993 • Pustejovsky, Anick and Bergler, 1993 • Lauer, 1995 • Currently • Lapata and Keller, 2004 • Nakov and Hirst, 2005 • Medical Domain • Nakov and Hirst, 2005

  10. Example small bowel obstruction

  11. Syntactic Structure Monolithic Non-branching Left-branching Right-branching small bowel obstruction small bowel obstruction small bowel obstruction small bowelobstruction smallbowel obstruction

  12. Method used to determine the structure of a term The Log Likelihood Ratio is the ratio between the observed probability of a term occurring and the probability it would be expected to occur

  13. Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model P(small bowel obstruction) P(small) P(bowel) P(obstruction)

  14. Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model OBSERVED PROBABILITY P(small bowel obstruction) P(small) P(bowel) P(obstruction)

  15. Log Likelihood Ratio The expected probability of a term is often based on the Non-branching (Independence) Model P(small bowel obstruction) P(small) P(bowel) P(obstruction) EXPECTED PROBABILITY

  16. Extended Log Likelihood Ratio The expected probabilities can be calculated using two other models Non-branching Left-branching Right-branching P(small)P(bowel)P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction)

  17. Three Log Likelihood Ratio Equations Non-branching P(small bowel obstruction) P(small) P(bowel) P(obstruction) Right-branching Left-branching P(small bowel obstruction) P(small bowel) P(obstruction) P(small bowel obstruction) P(small) P(bowel obstruction)

  18. Expected Probability Non-branching Left-branching Right-branching The expected probability of a term differs as does the Log Likelihood Ratio P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small) P(bowel obstruction) LL = 5,169.81 LL = 8,532.90 LL = 11,635.45

  19. Model Fitting Non-branching Left-branching Right-branching The model with the lowest Log Likelihood Ratio that best describes the underlying structure of the term P(small) P(bowel) P(obstruction) P(small bowel) P(obstruction) P(small)P(bowel obstruction) LL = 5,169.81 LL = 8,532.90 LL = 11,635.45

  20. ReCap • The Log Likelihood Ratio is calculated for each possible model • Non-branching • Left branching • Right branching • The probabilities for each model are calculated using frequency counts from a corpus • Term is assigned structure whose model has the lowest Log Likelihood Ratio

  21. Test Set Monolithic Non-branching Left-branching Right-branching 708 three word terms from the SNOMED-CT 73 terms 6 terms 251 terms 378 terms

  22. Test Set • Syntactic structure determined by two medical text indexers • Kappa = 0.704 • Frequency counts obtained from over 10,000 clinical notes from the Mayo Clinic

  23. Results with Monolithic Terms 74.8 53.4 Percentage agreement with human experts 35.5 Technique

  24. Results without Monolithic Terms 83.5 59.5 39.5 Percentage agreement with human experts Technique

  25. Limitations • Does not identify Monolithic Terms • Collocation extraction • Dictionary lookup • Number of words in term grows so does the number of models • Limit length of terms to 5 words

  26. Conclusions • Simple but effective method for identifying three-word terms • Method uses the Log Likelihood Ratio • Easily extended to four and five word terms

  27. Future Work • Improve accuracy • Explore other measures of association • Dice coefficient, phi ... • Incorporate multiple measures • Extend method to four and five word terms

  28. Thank you Software: Ngram Statistic Package (NSP) www.d.umn.edu/~tpederse/nsp.html Log Likelihood Ratio Models www.cs.umn.edu/~bthomson/mti.html

  29. Log Likelihood Equation

  30. Expected Values Non-branching: Left-branching: Right-branching:

  31. Non-branching: mxyz = nx++ * n+y+ * n++z / n+++ • Left-branching: mxyz = nxy+ * n++z / n+++ • Right-branching: mxyz = nx++ * n+yz / n+++

More Related