1 / 50

A UMLS- Based System for Literature-Based Discovery in Medicine

A UMLS- Based System for Literature-Based Discovery in Medicine . Matteo Gabetta. MEDINFO Copenhagen, August 21 st 2013. Literature Based Discovery (LBD). Discover unknown relationships among scientific knowledge.

eithne
Download Presentation

A UMLS- Based System for Literature-Based Discovery in Medicine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A UMLS-Based Systemfor Literature-BasedDiscoveryin Medicine Matteo Gabetta MEDINFOCopenhagen, August 21st 2013

  2. Literature Based Discovery (LBD) Discover unknown relationships among scientific knowledge Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  3. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  4. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  5. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  6. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  7. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  8. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  9. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  10. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  11. Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.

  12. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  13. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  14. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  15. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  16. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  17. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  18. System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process

  19. System characteristics • Moreover: • Co-cited UMLS concepts = related concepts • Semantic Types used for filtering • Literature-Mining Database as a persistence layer • Technologies: • Java • Entrez Programming Utilities – eUtils • GWT – Google Web Toolkit

  20. System characteristics • Moreover: • Co-cited UMLS concepts = related concepts • Semantic Types used for filtering • Literature-Mining Database as a persistence layer • Technologies: • Java • Entrez Programming Utilities – eUtils • GWT – Google Web Toolkit

  21. System Workflow

  22. System Workflow (AB)

  23. System Workflow (BC)

  24. System Workflow (final)

  25. Support & Confidence

  26. Support & Confidence

  27. The INHERITANCE project Integrated Heart Research In Translational Genetics of Cardiomyopathies in Europe • Dilated cardiomyopathies • 3 year health research project • European commission funding program 7 • 11 European centers

  28. Validation “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association

  29. Validation “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association

  30. Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM time Nov 1975 Angiology. 1975 Nov;26(10):723-33. The differential diagnosis of congestive cardiomyopathyand ischemic cardiomyopathy by echocardiography. Shors CM, et al.

  31. Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM LMNA time Nov 1975 Apr 1982 J Biol Chem. 1982 Apr 25;257(8):4328-32. Oligomeric structure of the major nuclear envelope protein lamin B. Shelton KR, et al.

  32. Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM LMNA LMNA+DCM time Nov 1975 Apr 1982 Dec 1999 N Engl J Med. 1999 Dec 2;341(23):1715-24. Missense mutations in the rod domain of the lamin A/C gene as causes of dilated cardiomyopathy and conduction-system disease. Fatkin D, et al.

  33. Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM LMNA LMNA+DCM time Nov 1975 Apr 1982 Dec 1999

  34. Validation: an example • A string : “Dilated cardiomyopathy” •  • A concept : “Cardiomyopathy, Dilated – (C0007193)” •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature A obtained •  • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)

  35. Validation: an example • A string : “Dilated cardiomyopathy” •  • A concept : “Cardiomyopathy, Dilated – (C0007193)” •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature A obtained •  • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)

  36. Validation: an example • A string : “Dilated cardiomyopathy” •  • A concept : “Cardiomyopathy, Dilated – (C0007193)” •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature A obtained •  • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)

  37. Validation: an example • A string : “Dilated cardiomyopathy” •  • A concept : “Cardiomyopathy, Dilated – (C0007193)” •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature A obtained •  • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)

  38. Validation: an example •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature Bobtained •  • C concepts: • One Semantic Type: “Gene or Genome – T028”

  39. Validation: an example •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature Bobtained •  • C concepts: • One Semantic Type: “Gene or Genome – T028”

  40. Validation: an example •  • Query dates : (Apr 1982 – Nov 1999) •  • Literature Bobtained •  • C concepts: • One Semantic Type: “Gene or Genome – T028” • Is LMNA between C concepts? • Evaluation of Support and Score

  41. Validation: results

  42. Validation: results

  43. Validation: results

  44. Validation: results

  45. Validation: results

  46. Validation: results

  47. Validation: results

  48. Discussion and Future Developments • Effective in ranking DCM related genes • Heuristic score good alternative to Support • Limitation: fails for C concepts with small literature • Analyze in depth the “threshold problem” • Practical comparison with other systems • Improve effectiveness of Text Mining system

  49. Discussion and Future Developments • Effective in ranking DCM related genes • Heuristic score good alternative to Support • Limitation: fails for C concepts with small literature • Overcome the empirical set-up of some parameters • Practical comparison with other systems • Improve effectiveness of Text Mining system

  50. In lovingmemory of Gilles Belley ThankYou.

More Related