380 likes | 495 Views
László van den Hoek Barend Mons Erik van Mulligen Erasmus MC Rotterdam. Elena Beißwanger Stefan Schulz Holger Stenzhorn Freiburg/Jena/IFOMIS. Formalizing heritage?. Bridging UMLS and BioTop for text mining. Example. UMLS SN. BioTop. Semantic tagging. Semantic tagging.
E N D
László van den Hoek Barend Mons Erik van Mulligen Erasmus MC Rotterdam Elena Beißwanger Stefan Schulz Holger Stenzhorn Freiburg/Jena/IFOMIS Formalizing heritage? Bridging UMLS and BioTop for text mining
Example UMLS SN BioTop
Improving the state of art • State of the art NLP offers 85% Precision/Recall • In case of multiple possibilities, help eliminate options • Improvement over state of art through incorporating additional information sources • Factual links from GO • Factual / “Factual” links from SwissProt • Our goal: see if UMLS SN can help in a similar manner
WikiProteins Can accommodatemultiple datasets Stores concepts, and expressions describing them
External information Allows concepts to be annotated
Wiki Filtering OK “Fern is in kingdom plant”
Wiki Filtering Error “Fern is in kingdom bicycle”
Did you mean… (manufacturing plant) Wiki Filtering “Fern is in kingdom plant”
Background • UMLS Semantic Network • BioTop/OWL-DL
Background:UMLS Semantic Network (SN) • Classification scheme • Interface terminology for UMLS • Based on frame-based logic • Classes: Semantic Types (ST) • Relations are defined between ST’s
Issues with the UMLS ST’s • Some ST’s are defined ambiguously or vaguely • Some arbitrary divisions are present • Categories have relatively low granularity • As is intended, to maintain usability
Why UMLS? • Often used for named entity recognition • Peregrine, Metamap, etc. • Widely used in practice • 1.2M UMLS concepts are tagged with one or more of the 135 ST’s • Coding system, e.g. electronic health records • Adding ontological rigor may extend its applicability
Background: BioTop • Mid-to-upper-level domain ontology • Rooted at BFO, expands into biomedicine • Intended umbrella for OBO Foundry • Written in OWL-DL • Formal rigor from BFO • Unambiguous • Allows reasoning/consistency checking • Still undergoing development
Top: Basic Formal Ontology Middle downwards: BioTop (or Dolce)
Prototype procedure • Map UMLS ST’s onto BioTop • Translate relationships to properties • Answer questions with reasoner • What ST’s are related to “plant” (fern)? • (How) are “plant” and “manufactured object” (bicycle) related? • Evaluate disambiguation using ontology
Mapping Mapping file contains owl:imports for both UMLS ST tree and BioTop Mapping imports imports BioTop UMLS ST
Mapping Within the mapping file, equivalent classes are defined by owl:equivalentClass umls:plant ≡ biotop:plant, biotop:plant ≡ umls:plant Mapping imports imports BioTop UMLS ST
Mapping If no equivalent class can be found, confer with BioTop authors; either,... umls:machine activity ≡ ? Mapping imports imports BioTop UMLS ST
Mapping …appropriate class is added to BioTop core and equivalence is stated, or… umls:machine activity ≡ biotop:MachineAction Mapping imports imports BioTop UMLS ST biotop:MachineAction
Mapping …helper class is added to mapping itself Subclass of BioTop:BacterialCell umls:Chlamydia or Rickettsia ≡ mapping:ChlamydialCell U mapping:Rickettsialcell Mapping Subclass of BioTop:BacterialCell imports imports BioTop UMLS ST
Considerations • Equal name doesn’t mean equivalence • Different name doesn’t mean difference • Some things can’t be translated into classes • A logically sound ontology may still contain real-world contradictions
Mapping results • Initially: • 10 ST’s match directly with a BioTop class • 14 classes were “close enough” • After iterative revisions of BioTop: • 3 ST’s defined as conjunctions • Many classes added to core BioTop • Some are straight matches, others are not • ~70 ST’s remain unmapped • Mainly “Event” and “Phenomenon or Process” trees
ArtefactRole DiagnosticRole FindingRole FoodRole OccupationalRole PoisonRole ResearchRole SignallingRole SignOrSymptomRole TherapeuticRole DrugRole VitaminRole *Role Many things can be defined by their role:
Mapping SN relationships • Some reinterpretation is necessary due to underspecification • Implicit semantics, domain expert required • What does the presence of a link mean? • Some/some, some/all, all/some, all/only, all/each • Naïve approach: add relationships as properties of classes
Biologic Function | affects | Organism Cell Component | affects | Physiologic Function … … Anatomical Abnormality | affects | Organism Anatomical Abnormality | affects | Physiologic Function <owl:ObjectProperty rdf:ID="affects_anatomical_abnormality"> <rdfs:domain rdf:resource=“¨s;#Anatomical_Abnormality"/> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdf:Description rdf:about=“¨s;#Organism"/> <rdf:Description rdf:about="¨s;#Physiologic_Function"/> </owl:unionOf> </owl:Class> </rdfs:range> <rdfs:subPropertyOf> <owl:ObjectProperty rdf:ID="affects"/> </rdfs:subPropertyOf> </owl:ObjectProperty>
SPARQL woes <owl:ObjectProperty rdf:ID="affects_anatomical_abnormality"> <rdfs:domain rdf:resource=“¨s;#Anatomical_Abnormality"/> <rdfs:range> <owl:Class> <owl:unionOf rdf:parseType="Collection"> <rdf:Description rdf:about=“¨s;#Organism"/> <rdf:Description rdf:about="¨s;#Physiologic_Function"/> </owl:unionOf> </owl:Class> </rdfs:range> <rdfs:subPropertyOf> <owl:ObjectProperty rdf:ID="affects"/> </rdfs:subPropertyOf> </owl:ObjectProperty>
SPARQL query SELECT DISTINCT ?d ?p ?r WHERE { ?p rdfs:domain ?d . ?rs rdfs:subClassOf ?d . ?p rdfs:range ?r FILTER ( ?rs = umls:Organism ) } ORDER BY ?d Or: what classes are in the domain of a property that also has Organism in its range?
Current status • Can’t follow OWL:UnionOf approach because of current reasoner limitations • Stop-gap solution: make one property for each SN relationship • Avoids owl:unionOf • Allows us to get actual results • Not in the spirit of OWL • Makes maintenance more difficult
Issues • “Untranslatable” classes • Hidden semantics • Interpretation • Relations: properties or classes? • Lack of a proper query language
Discussion • BioTop used as a framework for formalizing UMLS Semantic Network • Tap into existing resources • Mapping has not yet been put to the test
Future directions • Evaluate effect of ontology on precision/recall of tagging • User interface aid • Extend ontology as needed • Feed back findings to improve UMLS SN
Special thanks • Ronald Cornet (AMC) • Olivier Bodenreider (NIH) • Jeen Broekstra (WUR)
URLs • Mapping: http://purl.org/biotop/umls-mapping • BioTop: http://purl.org/biotop/ • Wikifier: http://wikifier.wikiprofessional.org/ • WikiProteins: http://proteins.wikiprofessional.org/ • Biosemantics group: http://www.biosemantics.org/