500 likes | 623 Views
A UMLS- Based System for Literature-Based Discovery in Medicine . Matteo Gabetta. MEDINFO Copenhagen, August 21 st 2013. Literature Based Discovery (LBD). Discover unknown relationships among scientific knowledge.
E N D
A UMLS-Based Systemfor Literature-BasedDiscoveryin Medicine Matteo Gabetta MEDINFOCopenhagen, August 21st 2013
Literature Based Discovery (LBD) Discover unknown relationships among scientific knowledge Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
Literature Based Discovery • Methods of discovery • OPEN vs. CLOSED • Sources of knowledge • Abstract, Full Text, MeSH, … • Knowledge representation • Concepts, (groups of) words • Knowledge extraction • Text mining techniques • Relationship measurement • Citation frequency, association rules… • Process automation • User interaction level Swanson DR: “Fish oil, Raynaud’s syndrome, and undiscovered public knowledge”. Perspectives in Biology and Medicine 1986, 30(1):7-18.
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Methods of discovery • OPEN discovery • Sources of knowledge • Abstract • Knowledge representation • UMLS concepts • Knowledge extraction • Text mining techniques • Relationship measurement • Support/Confidence from association rule theory • Process automation • Highly interactive discovery process
System characteristics • Moreover: • Co-cited UMLS concepts = related concepts • Semantic Types used for filtering • Literature-Mining Database as a persistence layer • Technologies: • Java • Entrez Programming Utilities – eUtils • GWT – Google Web Toolkit
System characteristics • Moreover: • Co-cited UMLS concepts = related concepts • Semantic Types used for filtering • Literature-Mining Database as a persistence layer • Technologies: • Java • Entrez Programming Utilities – eUtils • GWT – Google Web Toolkit
The INHERITANCE project Integrated Heart Research In Translational Genetics of Cardiomyopathies in Europe • Dilated cardiomyopathies • 3 year health research project • European commission funding program 7 • 11 European centers
Validation “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association
Validation “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association
Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM time Nov 1975 Angiology. 1975 Nov;26(10):723-33. The differential diagnosis of congestive cardiomyopathyand ischemic cardiomyopathy by echocardiography. Shors CM, et al.
Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM LMNA time Nov 1975 Apr 1982 J Biol Chem. 1982 Apr 25;257(8):4328-32. Oligomeric structure of the major nuclear envelope protein lamin B. Shelton KR, et al.
Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM LMNA LMNA+DCM time Nov 1975 Apr 1982 Dec 1999 N Engl J Med. 1999 Dec 2;341(23):1715-24. Missense mutations in the rod domain of the lamin A/C gene as causes of dilated cardiomyopathy and conduction-system disease. Fatkin D, et al.
Validation: idea “Re-discover” DCM/gene association • Only literature prior to 1st explicit DCM/gene association DCM LMNA LMNA+DCM time Nov 1975 Apr 1982 Dec 1999
Validation: an example • A string : “Dilated cardiomyopathy” • • A concept : “Cardiomyopathy, Dilated – (C0007193)” • • Query dates : (Apr 1982 – Nov 1999) • • Literature A obtained • • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)
Validation: an example • A string : “Dilated cardiomyopathy” • • A concept : “Cardiomyopathy, Dilated – (C0007193)” • • Query dates : (Apr 1982 – Nov 1999) • • Literature A obtained • • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)
Validation: an example • A string : “Dilated cardiomyopathy” • • A concept : “Cardiomyopathy, Dilated – (C0007193)” • • Query dates : (Apr 1982 – Nov 1999) • • Literature A obtained • • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)
Validation: an example • A string : “Dilated cardiomyopathy” • • A concept : “Cardiomyopathy, Dilated – (C0007193)” • • Query dates : (Apr 1982 – Nov 1999) • • Literature A obtained • • B concepts: • Semantic Type filter (21 types allowed) • Support & Confidence (greater than average)
Validation: an example • • Query dates : (Apr 1982 – Nov 1999) • • Literature Bobtained • • C concepts: • One Semantic Type: “Gene or Genome – T028”
Validation: an example • • Query dates : (Apr 1982 – Nov 1999) • • Literature Bobtained • • C concepts: • One Semantic Type: “Gene or Genome – T028”
Validation: an example • • Query dates : (Apr 1982 – Nov 1999) • • Literature Bobtained • • C concepts: • One Semantic Type: “Gene or Genome – T028” • Is LMNA between C concepts? • Evaluation of Support and Score
Discussion and Future Developments • Effective in ranking DCM related genes • Heuristic score good alternative to Support • Limitation: fails for C concepts with small literature • Analyze in depth the “threshold problem” • Practical comparison with other systems • Improve effectiveness of Text Mining system
Discussion and Future Developments • Effective in ranking DCM related genes • Heuristic score good alternative to Support • Limitation: fails for C concepts with small literature • Overcome the empirical set-up of some parameters • Practical comparison with other systems • Improve effectiveness of Text Mining system
In lovingmemory of Gilles Belley ThankYou.