1 / 20

Provenance in a Collaborative Bio-database RAASWiki

Provenance in a Collaborative Bio-database RAASWiki. Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for Provenance April 20th 2009. Provenance in Bio-databases including RAASWiki. Donald Dunbar & Jon Manning

tuwa
Download Presentation

Provenance in a Collaborative Bio-database RAASWiki

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenance in a Collaborative Bio-databaseRAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for Provenance April 20th 2009

  2. Provenance in Bio-databasesincluding RAASWiki Donald Dunbar & Jon Manning Queen’s Medical Research Institute University of Edinburgh Use Cases for Provenance April 20th 2009

  3. Plan bio-databases provenance RAASWiki collaborative knowledgebases

  4. Biological databases • Sequences • Ensembl, Entrez • Structure • PDB • Expression • GEO, ArrayExpress • Function • Gene Ontology • Interaction • MINT, BIND, KEGG • ‘Warehouses’ • GeneCards, IUPHAR • Literature • Pubmed

  5. How do they handle provenance? Ensembl produces genome databases for vertebrates and other eukaryotic species, and makes this information freely available online. ‘Gene’ ID histories (with stable ID) Evidence for gene predictions Links to other databases (egUniprot)

  6. How do they handle provenance? The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies. Primary citation History: deposition and last update Raw data and protocols

  7. How do they handle provenance? Gene Expression Omnibus: a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval. Standards compliance (protocols, data…) Links within database (microarrays, protocols) Raw data and protocols

  8. How do they handle provenance? The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. Evidence for gene annotation (experimental, computational) Links to original publications No versioning, just updates

  9. How do they handle provenance? PubMed is a free search engine for accessing the MEDLINEdatabase of citations, abstracts and some full text articles on life sciences and biomedical topics. Original source material, authors, abstracts Unique Pubmed ID (used by other databases) Continual updates (new papers), occasional retractions

  10. How do they handle provenance? GeneCards® is a searchable, integrated database of human genes that provides concise genomic, proteomic, transcriptomic, genetic and functional information on all known and predicted human genes. Lots of data from other databases IDs/keys from sources Lots of data integration based on IDs

  11. How do they handle provenance? The IUPHAR database (IUPHAR-DB) integrates peer-reviewed pharmacological, chemical, genetic, functional and anatomical information on GPCRs, ligand-gated ion channels and voltage-gated-like ion channel subunits encoded by the human, rat and mouse genomes. Curated by experts Original sources plus curation provenance Suggested citations

  12. Newer developments WikiGenes is the first wiki system to combine the collaborative and largely altruistic possibilities of wikis with explicit authorship. In view of the extraordinary success of Wikipedia there remains no doubt about the potential of collaborative publishing, yet its adoption in science has been limited. Here I discuss a dynamic collaborative knowledge base for the life sciences that provides authors with due credit and that can evolve via continual revision and traditional peer review into a rigorous scientific tool. but….

  13. RAASWiki RAASWiki is a knowledgebase of information on the renin-angiotensin-aldosterone system. While much of the seed data were derived from pre-existing databases such as KEGG and OMIM, supplementary data are included not easily available through such resources. This includes short textual reports on the genes involved, and more experimentally-oriented information such as animal models. Important biology - hypertension Automatic seeding of database (BioKB) Collaborative editing (Wiki based, useful functionality) Genes, publications, animal models, datasets…

  14. RAASWiki – provenance Seeded data tagged with source database and date Edits are tagged with editor and date Comments are tagged: name and date Wiki functionality allows versioning and roll back Identifiers for source databases preserves provenance ‘Crowd wisdom’ will hopefully unsure good quality

  15. RAASWiki – provenance issues How much detail (each edit, granularity, versions)? Who will use provenance data? Different focus depending on data (who, when, confidence) How much should we rely on sources for provenance? Annotation & comments v changing data Public v private data Likely to become a big issue

  16. What provenance to we need? Example: Gene expression in a transgenic animal gene expression measurements gene annotation where, when when, what, how public databases output from machine which identifiers how integration processing what and how did we select genes data mining …

  17. What provenance to we need? Example: Curated gene database database links curation contributor, date source, identifiers, dates curator input development verify, add, delete, modify schema & interface changes archive versions, dates Curated database

  18. Collaborative knowledgebases databases knowledge knowledgebase experiments papers

  19. Collaborative knowledgebaseprovenance issues Confidence in data Tracking data to its (real) source Published papers do not contain all information When is something (knowledge) finished Citing of knowlegebase records Linking between knowledgebase records Some sort of dynamic publication

  20. Conclusions • In biology provenance is a mixed bag • We use mainly static databases • Usually source is clear but not much else • RAASWiki contains static and curated data • We have implemented a very rudimentary provenance scheme • Collaborative knowledgebases will need to address provenance in new ways

More Related