1 / 20

PRIDE Part II: The Ontology Lookup Service Proteome Harvest: Getting Data into PRIDE

PRIDE Part II: The Ontology Lookup Service Proteome Harvest: Getting Data into PRIDE. Phil Jones EMBL-EBI. Introduction: Why did we make OLS?. OLS is the Ontology Lookup Service The original pressure to develop this service was a consequence of the extensive use of ontologies in PRIDE.

swann
Download Presentation

PRIDE Part II: The Ontology Lookup Service Proteome Harvest: Getting Data into PRIDE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PRIDE Part II: The Ontology Lookup ServiceProteome Harvest: GettingData into PRIDE Phil Jones EMBL-EBI

  2. Introduction: Why did we make OLS? • OLS is the Ontology Lookup Service • The original pressure to develop this service was a consequence of the extensive use of ontologies in PRIDE. • Using ontologies and controlled vocabularies is very powerful, ensuring that standard terminology is used to annotate submissions, allowing intelligent and powerful query of the data…

  3. Ontologies can be complex… • Ontologies (and more complex controlled vocabularies) are not just lists of terms. • At the very least, they involve the construction of a potentially complex directed acyclic graph of terminology that defines the relationships between terms. Each term may have zero, one or many parents, unlike a simple hierarchy. • Well produced ontologies use stable identifiers for each concept (the actual term may modify, but the concept it is describing should not) • The most common relationship types are: • ‘Ais a B’, indicating that A is a specialisation of B • ‘Ais part ofB’, indicating a whole / part relationship between B and A. • Less commonly you may find: • ‘Adevelops from B’ which describes a temporal relationship between A and B, such as used in ontologies describing • Ontology developers can also define their own relationships between terms.

  4. Example… • From the OBO Gene Ontology (GO) • ‘Ais_a B’, indicating that A is a specialisation of B • ‘Apart_ofB’, indicating a whole / part relationship between B and A. • Note that the arrows on this diagram are misleading – to read the relationship between terms, read against the arrow. • Rule:All the paths from a term to the root terms must be true at all times.

  5. So what’s the problem? • These relationships have consequences when querying a database annotated using the ontology. • What happens when I ask for PRIDE experiments describing the proteome of brain tissue?

  6. The Use of Controlled Vocabulariesand Ontologies in PRIDE • Require controlled vocabularies / ontologies to define the search space: • Species: Newt / NCBI Taxonomy ID • Tissue / organ / cell type: BRENDA Tissue ontology, Cell Type ontology; • Sub-cellular component: Gene Ontology: GO; • Disease: Human Disease: DOID; • Genotype: GO; • Sample Processing: PSI Ontology; • Mass Spectrometry: PSI-MS Ontology; • Protein Modifications: PSI-MOD Ontology; • Terms that fit nowhere else!? - PRIDE CV. OBO Ontologies

  7. Introduction to Open Biomedical Ontologies http://obo.sourceforge.net • OBO is a central web location for accessing well-structured controlled vocabularies and ontologies for use in the biological and medical sciences. • OBO provides a simple format for ontologies that is able to encode terms, relationships between terms and definitions of terms including those taken from external ontologies. (Not all OBO ontologies use this format however).

  8. Scope of Open Biomedical Ontologies • Anatomy • Animal natural history and life history • Chemical • Development • Ethology • Evidence codes • Experimental conditions • Genomic and proteomic • Metabolomics • OBO relationship types • Phenotype • Taxonomic classification • Vocabularies

  9. What is OLS? • A single point of query for currently 47 ontologies. • Ontologies are updated daily from CVS repositories, including the OBO CVS repository and the PRIDE CVS repository. • A tool that offers interactive and programmatic interfaces for queries on term names, synonyms, relationships, annotations and database cross-references.

  10. What is OLS? • Consists of a Java API, a database back end for data storage and both a web-application and SOAP front-ends. • Built entirely as a component-based, best-of-breed open source project and is open source itself. • Web interface - AJAX for a rapid and enjoyable experience. • SOAP interface - programmatic hook into OLS.

  11. OLS Usage examples • What does GO:0005739 actually mean? • A: search by term accession • What is the accession for “mitochondrion” in GO? In MeSH? • A: search by term name in a specific ontology or across all • I’m looking for a term to annotate my protocol step but I’m not sure what term to use. • A: browse an ontology

  12. OLS Usage examples • I’m looking for all the experiments done on liver tissue? • A: get all children term of liver and query on those as well • My data set was annotated with GO version 123 but that was a long time ago? • A: get updated term names for the identifiers you have and see if any have been made obsolete

  13. The OLS Web Pages

  14. Getting Data into PRIDE:Proteome Harvest and the ProDacs Grant

  15. Proteome Harvest • BBSRC grant to develop Excel spreadsheet based data submission tools for proteomics. • PRIDE data submission spreadsheet developed to assist institutions with smaller / less frequent submissions and perhaps limited bioinformatics infrastructure.

  16. Proteome Harvest • Allows direct use of the Ontology Lookup Service at the EBI – assists with complete annotation of the experiment using controlled vocabulary terms. • Directly generates valid PRIDE XML that can be submitted to PRIDE without modification. • Allows embedding of externally generated mzData files.

  17. Proteome Harvest Web Pagehttp://www.ebi.ac.uk/pride/proteomeharvest/index.html

  18. The Proteome Harvest PRIDESpreadsheet

  19. The ProDaC Grant • Grant to further the development of data exchange standards for Proteomics, based upon HUPO PSI. • The creation of reusable data pipelines to submit data into PRIDE from several major data producers. • Will increase the quality and standardisation of data being submitted to PRIDE. • Supports the further development of PRIDE to implement the relevant standards as they are ratified and improve the submission process.

  20. Summary: Future Developments in PRIDE • HUPO PSI Data Format Integration • mzData & mzXML Merger • AnalysisXML & GelML • PRIDE as a FuGE Implementation? • Spin out Projects – Solving significant problems for PRIDE • Protein Identification Mapping • Getting Data out of PRIDE • ‘Global’ Experiment Comparison • DAS, Dasty2 and SPICE Integration • PRIDE BioMart • Getting Data into PRIDE: • The Proteome Harvest Project • The ProDaC Grant

More Related