1 / 26

Ontologically Modeling Sample Variables in Gene Expression Data

Ontologically Modeling Sample Variables in Gene Expression Data. James Malone malone@ebi.ac.uk EBI, Cambridge, UK. Overview. Application Background Motivation for ontologies – questions we to answer Methodology Ontology and application Future work/things we’d like to do.

romney
Download Presentation

Ontologically Modeling Sample Variables in Gene Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontologically Modeling Sample Variables in Gene Expression Data James Malonemalone@ebi.ac.ukEBI, Cambridge, UK

  2. Overview • Application Background • Motivation for ontologies – questions we to answer • Methodology • Ontology and application • Future work/things we’d like to do Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  3. Gene Expression: Archive to Atlas ArrayExpress Curation Curation AE/GEO acquire Re-annotate & summarize ATLAS >250,000 Assays >10,000 experiments Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  4. Gene Expression Sample Variable Annotations 4

  5. Use Cases • Query support (e.g, query for 'cancer' and get also ‘leukemia') • Data visualisation – e.g., presenting an ontology tree to the user of what is in the database • Data integration by ontology terms – e.g., we assume that 'kidney' in independent studies roughly means the same, so we can count how many kidney samples we have in the database • Intelligent template generation for different experiment types in submission or data presentation • Summary level data • Nonsense detection – e.g. telling us that something marked as cancer can not be marked as healthy Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  6. Questions we want to answer • Diverse nature of annotations on data • Need to support complex queries which contain semantic information • E.g. which genes are under-expressed in brain samples in human or mouse • If we annotate with do we get this data? cancer adenocarcinoma Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  7. Primary Question: Where to place our semantics? Atlas/AE cancer adenocarcinoma Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  8. Decoupling knowledge from data Atlas/AE Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  9. Cell type HumanAnatomy GO Process Methodology: Reference vs Application Ontology • Debate in community about difference, here is our thesis • A reference ontology describes a knowledge space; an explicitly delineated part of a domain. Biomedicine Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  10. Cell type HumanAnatomy GO Process Methodology: Reference vs Application Ontology • An application ontology describes an application or data space; an explicitly delineated part of a domain. • Should consume reference ontologies to meet application needs Biomedicine Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  11. Building the Experimental Factor Ontology We consume parts of reference ontologies from domain Construct new classes and relations to answer our use cases Aim is reuse of existing resources, shared frameworks and mapping of equivalencies where they exist Chemical Entities of Biological Interest (ChEBI) Relation Ontology Ontology Biomedical Investigations Text mining Various Species Anatomy Ontologies Anatomy Reference Ontology Disease Ontology EFO Ontologically Modeling Sample Variables in Gene Expression Data malone@ebi.ac.uk 11 8/22/2014

  12. Identify Upper Level Structure • Taken a BFO-lite approach, hiding labels from users for application purposes and sometimes different definition information content entity (IAO) site (BFO) processual entity (BFO) material entity (BFO) specifically dependent continuant (BFO) Specifically dependent continuant: A continuant [snap:Continuant] that inheres in or is borne by other entities. Every instance of A requires some specific instance of B which must always be the same. Material property: A property or characteristic of some other entity. For example, the mouse has the colour white.

  13. Adding New Classes@ www.ebi.ac.uk/efo/tools • We wish to maximise our interoperability • Submitters and other groups use many ontologies • Trade-off: open to their data and preferences vs imposing a more ordered view on semantics • Our goal:Where orthognality exists we aim to import only that classs. Where it does not, we perform ‘mappings’ in our EFO classes via annotation property references (in similar way to xrefs) • E.g. chebi classes, import chebi URIfor ‘cancer’, create an EFO class and add multiple mappings

  14. Creating Class Mappings • For overlapping ontologies, we aim to create a ‘mapping class’ • Use semi-automated text mining “double-metaphone” algorithm • Perform matching of our values in database to ontology class labels and definitions. • Also perform mappings from EFO to other ontologies, so that EFO: cancer = NCI: cancer, DO: cancer et al. • Sanity checking over mappings before adding to ontology

  15. Keeping Up To Date with External Classes • Use of tool to automatically update metadata every release (monthly) • Uses BioPortal web services to access latest Class URI/ID definition,synonyms Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  16. Detecting Change in External Ontologies • Bubastis tool for detecting axiomatic changes between two ontologies (in our case 2 versions of same ontology) • @todo: detect annotation property changes • We also detect missing annotation properties with Watchman tool (not released yet) – mainly used for labels presently

  17. Creating Relations and Equivalent Classes species (human) organism part (cervix) cell line (Hela) cell type (epithelial) disease (cervical adenocarcinoma) Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  18. Structure for queries Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  19. Gene Expression Atlas • Linking data to the ontology Database formulated query Assay Table Ontology Term Table Query OWL Model Sample Table

  20. Gene Expression Atlas @ www.ebi.ac.uk/gxa Query for Cell adhesion genes in all ‘organism parts’ ‘View on EFO’ Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

  21. ArrayExpress Archive@ www.ebi.ac.uk/arrayexpress

  22. Future Work: Linked DataLinking data by dereferenceable URI for human and machine http://www.ebi.ac.uk/gxa/Experiment12345http://www.ebi.ac.uk/gxa/Experiment12345 Developing an Ontology from the Application Up malone@ebi.ac.uk

  23. Future Work: RDF Triple Store @ www.ebi.ac.uk/efo/semanticweb/atlas • Q: Is an RDF Triple store SPARQL query quicker than a SPARQL translated into SQL? RDF Triple Store SPARQL RDFizer OWL Ontology SQL Translation Layer Atlas Data

  24. Future Work: Data Integration • Consuming reference ontologies and mapping to multiple ontologies where overlap exists offers us maximum interoperability • The advantage of triple stores is not immediate yet • Impetus required: “should we champion this technology” QUERY Rdf triple Atlas Rdf triple Rdf triple Rdf triple Amino Acid Ontology Rdf triple Rdf triple SwissProt

  25. Summary • We have created a sustainable approach to consuming multiple reference ontologies • Tooling solutions to expedite process • We consider EFO to be a ‘view’ of such ontologies for our application needs • The primary aim of this work is to enable novel research with the experimental data we have • Specifically, we can answer new questions, integrate across our data resources, visualise and summarise the data • Our belief is describing such data should be the driving force behind ontology development • Future work will look at linked data and rdf triple stores

  26. Acknowledgements • Ontology creation: • James Malone, Tomasz Adamusiak, Ele Holloway, Helen Parkinson, Jie Zheng (U Penn) • Ontology Mapping tools and text mining evaluation: • Tim Rayner, Holly Zheng, Margus Lukk • GUI Development • Misha Kapushesky, Pasha Kurnosov, Anna Zhukova. Nikolay Kolesinkov • External Review and anatomy: • Jonathan Bard, Jie Zheng • ArrayExpress Production Staff • EBI Rebholz Group (Whatizit text mining tool) • Many source ontologies for terms and definitions esp. Disease Ontology, Cell Type Ontology, FMA, NCIT, OBI • Funders: EC (Gen2Phen,FELICS, MUGEN, EMERALD, ENGAGE, SLING), EMBL, NIH • Eric Neumann, Joanne Luciano and Alan RuttenbergW3C & HCLS Group - Eric Prud'hommeaux and Scott Marshall • OBI developers Ontologically Modeling Sample Variables in Gene Expression Datamalone@ebi.ac.uk

More Related