250 likes | 361 Views
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases. Document Navigation: Ontologies or Knowledge Organisation Systems. Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG, ISMB 2007 Vienna, Austria.
E N D
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Document Navigation: Ontologies or Knowledge Organisation Systems Simon Jupp Bio-Health Informatics Group University of Manchester, UK Bio-ontologies SIG, ISMB 2007 Vienna, Austria
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Introduction Document navigation over a Semantic Web Ontologies are being developed as background knowledge to drive the Semantic Web. Message: Formal ontologies are not the only knowledge artefact needed, artefacts with weaker semantics have their role and are the best solution in some circumstances
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases COHSE (Conceptual Open Hypermedia SErvice) Navigation via Hypertext is a mainstay of WWW Problem:Links are typically embedded to Web pages; hard-coding, format restrictions, ownership, legacy resources, maintenance, Unary targets etc.
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases COHSE Architecture HTML Document in Ontology Knowledge Service SKOS DLS Agent Search Engine Resource Service Linked HTML Document out Annotation DB
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Sealife project • A realisation of a Semantic Grid Browser, which links the currentWebto the emerging eScienceinfrastructure • Built on existing tools developed by partners: • COHSE, but also GoPubMed and others. • Applications: from cells via tissue to patients • Evidence-based medicine • Patent and literature mining • Molecular biology www.biotec.tu-dresden.de/sealife
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases What is a Sealife browser? A Sealife browser is a browser that identifies concepts in texts and offers appropriate services based on these concepts. Three main components: • an underlying “ontology” or set of “ontologies” • a text mining component • a service composition module
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Sealife use case - study of disease NeLI: National electronic Library of Infection portal. Range of users but few links. Given a document about Tuberculosis, where would users want to navigate to next? http://www.neli.org.uk
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Background Knowledge What style of knowledge model is best suited for navigation? Do we need strict semantics, are they a help or hindrance? All I want to say is this concept “has something to do with” this concept - this doesn’t fit well with formal ontology e.g Polio has something to do with polio virus / polio disease / polio treatment - get me documents about all of them!
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Sealife background knowledge To cover molecular biology through to medicine we need a large knowledge artefact to serve as background knowledge for SeaLife. This artefact must support sensible navigation between documents on the web. Luckily…
-Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy -Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development • Protein covalent bond • Protein domain • UniProt taxonomy -Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction • Sequence types and features • Genetic Context BRENDA tissue / enzyme source Phenotype Proteins Sequence Pathways Anatomy Phenotype Development Plasmodium life cycle Transcript Gene products Cell type -NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history -Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version - Molecule role - Molecular Function - Biological process - Cellular component eVOC (Expressed Sequence Annotation for Humans)
FMA GALEN DM&D UMLS OPCS3 OPCS4 OPCS4.3 CTV3 READ SNOMEDInternational ICPC SNOMED-CT SNOMED-2 SNOMED-RT 1975 1985 1995 2005 SNOPCPT OPCSEmTree SynopsisNosologiaeMethodicae History of Medical Vocabularies MeSH ICD ICD9 1603 1700 1785 1855 1900 1975 OPCSSNOPCPTMESHICD
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases What do we need for navigation? The bio-medical domain is rich in vocabularies and ontologies. Large lexical resource including textual definitions and synonyms There is a varying degree of semantics, expressivity and formality in these vocabularies (e.g. MeSH) and ontologies (e.g FMA). Most include some form of hierarchy. Hierarchies are well suited for driving navigation. Question: Do we want strict sub/super class relationships? Or, do we want looser notations such as broader/narrower?
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Ontology or Vocabulary? Initial approach to COHSE and SeaLife was to represent everything in OWL The strict semantics of OWL do not always lend themselves to sensible navigation, conversion from vocabularies to OWL are difficult. It’s hard to model some things in OWL… MeSH OBO/OWL Nucleuspart_ofCell Head <-- Ear <-- Nose Cellhas_partNucleus - Not always True Accident <-- Traffic Accident <-- Accident Prevention PolioDisease causedby PolioVirus
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases SKOS (Simple Knowledge Organisation System) Purpose: Subject Metadata and information retrieval e.g. This document is about tuberculosis Model for representing concept schemes, thesauri, classification system, taxonomies etc… SKOS Concept is an individual of the class skos:concept, relationships are just asserted facts between (not restrictions!) Model ideally suited for our task e.g. prefLabe, altLabel, broader, narrower, related. RDF/XML representation http://www.w3.org/2004/02/skos/
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Conversion to SKOS • relationship:part_of --> skos:broader ( e.g. finger part_of hand) • relationship:contains --> skos:narrower ( e.g. skull contains brain) • relationship:causes --> skos:related ( e.g. PolioDisease causes PolioVirus) Sub properties: inverse skos:narrower skos:broader obo:part_of obo:has_part Leaves us open to migration back to OWL
SKOS A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Conversion to SKOS OBO ontologies MeSH OWL ontologies SNOMED Concept Schemas Taxonomies NeLI Thesauri Other..
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Advantage of this approach For a given concept e.g. “Polio Virus”, we can query multiple resources and bring related concepts together. • Rapid (and cheap!) generation of knowledge artefact • Take advantage efforts in multiple biomedical communities • We don’t have to make any strong ontological distinctions
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Disadvantage of this approach Trade off: Lose the inferential power when querying a knowledge resource Unwanted concepts & relationship - especially from OWL conversion e.g. ‘Physical Entity’, ‘Continuant’ etc…. Linking overload! Inability to do inconsistency checking Potentially large redundancy in our knowledge base Maintenance and scalability (>1000000 concepts) - especially for dynamic hyper-linking.
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Plug for Manchester’s SKOS plug-in - Protégé 4 • Instance hierarchy viewer • OBO or OWL --> SKOS wizards • Various rendering options
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Conclusion We need a large knowledge artefact that support navigation between related web resources Rapid generation and reuse of existing terminologies (cheap) Loosening the semantics of our model enables this with acceptable trade off SKOS is a suitable model to represent our knowledge artefact We have strong use cases from the life sciences - demos available Still some open issues ….
A Semantic Grid Browser for the Life Sciences Applied to the Study of Infectious Diseases Acknowledgments Manchester Robert Stevens Sean Bechhofer Yeliz Yesilada Other COHSE developers Sealife project NeLI Patty Kostkova Thank you.