280 likes | 291 Views
This paper explores the use of the Unified Medical Language System (UMLS) to develop an ontology for entomology, specifically focusing on mapping and matching terms between the Torre-Bueno Glossary of Entomology and the UMLS Metathesaurus. The study demonstrates the potential of using existing biomedical ontologies, such as UMLS, to seed new domain-specific ontologies.
E N D
Finding Bugs in People:Developing an Entomology Ontology from the UMLS Indra Neil Sarkar, PhD Lewis B. & Dorothy Cullman Bioinformatics Associate Division of Invertebrate Zoology American Museum of Natural History NKOS Workshop 10 June 2005
Phenotypes Structural Data Sequence Data Morphology Total Evidence Tree of Life
Statements of Homology • Sequence Data • Multiple Sequence Alignments • CLUSTAL, T-COFFEE, MUSCLE • Non-sequence Data • Ontologies
Color Red White Blue Ontologies “White” “Blanc” “Weiss”
“White” “Blanc” “Weiss” Ogden-Richards Semiotic Triangle Thought/Reference XVFD Symbols Referent
Ontology Development • Protégé • http://protege.stanford.edu • “Frame-based”
Forelimb Foreleg Wing Arm Ontologies in Phylogenetics “Wing” “Aile” “Flügel”
Forelimb Foreleg (1) Arm (3) Wing (2) 1 1 1 1 3 2 Ontologies in Phylogenetics Forelimb Foreleg Arm Wing CAT BAT BIRD [Gene 1] [Gene 1] [Gene 1] [Gene 2] [Gene 2] [Gene 2] … … …
Ontologies in Phylogenetics • Genetic Information • 99% of Earth’s biota are extinct! • Morphological Information • Fossil record • Morphological studies from extant organisms
Ontologies in Phylogenetics • Ontology Development • Web Ontology Language (OWL) • Structured Descriptive Data (SDD) • Can be exported to NEXUS, DELTA, Lucid • Ontology Acquisition and Markup • Archival Resources • Natural Language Processing
Unified Medical Language System (UMLS) • Metathesaurus • One Million Concepts • 100+ Biomedical Terminologies/Ontologies • Semantic Network • 135 Semantic Types • 15 Coarse Semantic Groups • SPECIALIST Lexicon • English + Biomedical Words
Torre-Bueno Glossary of Entomology (TBGE) • Common Entomology Phrases • 300 Primary Sources • 15,010 Terms/Phrases
TBGE to UMLS • Question 1: Is Entomology Language Different than Biomedical Language? • TBGE to SPECIALIST • Question 2: Can UMLS Be Used to Seed an Ontology for Entomology? • TBGE to UMLS Metathesaurus • Organize Results According to Semantic Network
Q1: Is Entomology a Unique Language? • “Look-up” Individual Word Atoms in SPECIALIST • Complete Look-up • 48% Coverage • Partial Look-up • 66% Coverage • Not found • 34% Not covered
Q2: Can UMLS Be Used to Seed Entomology Ontology? • Three-Tiered Mapping Approach • Tier 1: Direct Mapping • Exact & Normalized String Matching • Tier 2: Direct Mapping after Demodification • Remove nominal and adjectival modifiers • Exact & Normalized String Matching • Tier 3: Approximate Matching • MetaMap Application
Q2: Can UMLS Be Used to Seed Entomology Ontology? • Three-Tiered Mapping Approach • Tier 1: Direct Mapping • Exact & Normalized String Matching • Tier 2: Direct Mapping after Demodification • Remove nominal and adjectival modifiers • Exact & Normalized String Matching • Tier 3: Approximate Matching • MetaMap Application
20 20 86 86 37 78 49 74 23 61 41 71 Q2: Can UMLS Be Used to Seed Entomology Ontology?
Source Terminologies Q2: Can UMLS Be Used to Seed Entomology Ontology?
TBGE-UMLS Implications • UMLS Semantic Network is a good Seed Ontology for Biological Domain Ontologies • Best Term-Concept Mappings into Anatomy
In Summary… • Ontologies are Needed for Phylogenetics • Existing Biomedical Ontologies Are Useful for New Domain Ontologies (especially UMLS) • Top-Down Strategy using UMLS is Tractable
Phenotypes Structural Data Sequence Data Morphology End Goal SDD OWL
Next Steps • Represent Seed Entomology Ontology in OWL • Link OWL Representation to SDD for use in Taxonomic Descriptions • Involve Team of Experts for Validation • Go Beyond Morphology-- Location, Biodiversity Data, etc.
Tom Moritz Rob DeSalle Mark Siddall David Figurski Susan Perkins Paul Planet Gloria Coruzzi Olivier Bodenreider Carol Friedman Jim Cimino Bob Morris Mark Musen Acknowledgements National Institutes of Health National Science Foundation American Museum of Natural History
http://www.GenomeCurator.org/people/sarkar Indra Neil Sarkar, Cullman Bioinformatics Associate American Museum of Natural History Thank you! sarkar@amnh.org