1 / 39

EcoliWiki and GONUTS

EcoliWiki and GONUTS. Wiki-based Systems for Community Annotation Jim Hu Dept. of Biochemistry and Biophysics Texas A&M University. Overview. EcoliWiki and the central problem in genome annotation Gene Ontology and the Gene Ontology Normal Usage Tracking System (GONUTS)

tryna
Download Presentation

EcoliWiki and GONUTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EcoliWiki and GONUTS Wiki-based Systems for Community Annotation Jim Hu Dept. of Biochemistry and Biophysics Texas A&M University

  2. Overview • EcoliWiki and the central problem in genome annotation • Gene Ontology and the Gene Ontology Normal Usage Tracking System (GONUTS) • Live demos/Discussion

  3. Annotation • Goals for annotation: • Coverage • Accuracy • Usefulness • for scientists (human-readable) • for machine inference generation (computer-understandable) • Annotation is a moving target!

  4. The need for Annotation is growing

  5. People are limiting for annotation • Major genome databases employ large numbers of people • This model problematic • Curators are expensive • NIH and NSF cannot afford to staff every organism at this level • Broad expertise across all areas is hard • Curators have to read papers in areas they were not trained in. • Curators may not recognize the significance of papers in areas they were not trained in • Can we make it: • cheaper? • faster? • better?

  6. The Wikipedia approach • Get your user community to work for free! • aka "Community annotation" or "Community curation"

  7. EcoliWiki http://ecoliwiki.org or .net or .com (most of our hits come from Google)

  8. “What is true of Escherichia coli is true of the elephant” - Jacques Monod “Thanks to annotation creep, what’s false for E. coli is false for the elephant too” - Jim Hu “What is true of Escherichia coli is true of the elephant” - Jacques Monod “Thanks to annotation creep, what’s false for E. coli is false for the elephant too” - Jim Hu http://www.pasteur.fr/infosci/archives/mon/im_ele.html

  9. EcoliWiki philosophy • Any registered user can edit • Any registered user can register new users • Any registered user can create new pages • It's easier to revise than to create new content • Seed content from other sites, mostly EcoCyc • Any registered user can edit • Any registered user can register new users • Any registered user can create new pages • It's easier to revise than to create new content • Seed content from other places, mostly EcoCyc

  10. GenBank's managers are dead set against letting users into GenBank's files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer. "That we would wholesale start changing people's records goes against our idea of an archive," says David Lipman, director of the National Center for Biotechnology Information (NCBI), GenBank's home in Bethesda, Maryland. "It would be chaos." GenBank's managers are dead set against letting users into GenBank's files, however. They say there already are procedures to deal with errors in the database, and researchers themselves have created secondary databases that improve on what GenBank has to offer. "That we would wholesale start changing people's records goes against our idea of an archive," says David Lipman, director of the National Center for Biotechnology Information (NCBI), GenBank's home in Bethesda, Maryland. "It would be chaos." But won't that invite chaos?

  11. Correct compared to what? NCBI RefSeq: Wikipedia:

  12. Correct compared to what? NCBI RefSeq: Wikipedia:

  13. Correct compared to what? NCBI RefSeq: Wikipedia:

  14. Correct compared to what?

  15. This is how biology achieves fidelity A collage of books I haven’t read

  16. Biology Wikis are proliferating

  17. Participation is the major challenge • Anyone can edit ≠ Anyone will edit • Wikipedia: a tiny fraction of the users edit anything • A tiny fraction of those do major editing • Really big denominator • Outreach to increase our user base

  18. Participation is the major challenge • Tools to make it easier to edit

  19. Biggest difference from other systems: Partial annotations are wanted It doesn't matter if you don't know the wiki markup It doesn't matter if what you're adding isn't fully worked out Someone else can fix it And you can fix what others write Participation is the major challenge

  20. Making it machine-friendly:ontologies • Ontology: • in philosophy: a metaphysical system for studying being • In biology/bioinformatics: a structured representation of biological knowledge • NCBO = National Center for Biological Ontologies • OBO = Open Biological Ontologies • Examples • MESH • Sequence ontology = SO • Phenotype and trait ontology = PATO • Gene Ontology = GO • see the EBI ontology browser: http://www.ebi.ac.uk/ontology-lookup/

  21. What is an ontology? • Controlled vocabulary with • Term identifiers • GO:0000075 • Name • cell cycle checkpoint • Definitions • "A point in the eukaryotic cell cycle where progress through the cycle can be halted until conditions are suitable for the cell to proceed to the next stage." [GOC:mah, ISBN:0815316194] • Relationships • is_a GO:0000074 ! regulation of progression through cell cycle • Terms arranged in a Directed Acyclic Graph (DAG)

  22. Pros and Cons of Ontologies • Pros • facilitate comparison across systems • facilitate computer based reasoning systems • Good for data mining! • Cons • Large and unwieldy • Difficult to understand • Difficult to use • May never capture knowledge accurately • Ontology development lags behind the field it tries to capture • Example of a theme of genomics: imperfect tools can still be very powerful!

  23. is_a part_of GO = Gene Ontology • 3 ontologies for gene products • Biological Process • Molecular Function • Cellular Component • Used to make annotations • aka Gene associations • Term + qualifiers + evidence code + reference etc. figure from GO consortium presentations from GOC

  24. Cellular Component • where a gene product acts figure from GO consortium presentations from GOC

  25. Cellular Component figure from GO consortium presentations from GOC

  26. Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity figure from GO consortium presentations from GOC

  27. Molecular Function insulin binding insulin receptor activity figure from GO consortium presentations from GOC

  28. Molecular Function • A gene product may have several functions • Sets of functions make up a biological process. figure from GO consortium presentations from GOC

  29. cell division Biological Process a commonly recognized series of events figure from GO consortium presentations from GOC

  30. Biological Process transcription figure from GO consortium presentations from GOC

  31. GO annotation • Find papers • Read them • Find what genes are mentioned • What assertions are made about the product? • What GO terms are applicable? • GO term browsers • Amigo http://amigo.geneontology.org/cgi-bin/amigo/go.cgi • GONUTS http://gowiki.tamu.edu • New term needed? • What evidence code should be used to record the assertion? • Record gene associations in the MOD database • Send gene associations to GO consortium • Downloadable files that users doing electronic analysis can parse

  32. Human vs Electronic GO annotations • What is the basis for making a gene association? • Human • Experimental Evidence Codes • EXP: Inferred from Experiment • IDA: Inferred from Direct Assay • IPI: Inferred from Physical Interaction • IMP: Inferred from Mutant Phenotype • IGI: Inferred from Genetic Interaction • IEP: Inferred from Expression Pattern • Computational Analysis Evidence Codes • ISS: Inferred from Sequence or Structural Similarity • ISO: Inferred from Sequence Orthology • ISA: Inferred from Sequence Alignment • ISM: Inferred from Sequence Model • IGC: Inferred from Genomic Context • RCA: inferred from Reviewed Computational Analysis • Author Statement Evidence Codes • TAS: Traceable Author Statement • NAS: Non-traceable Author Statement • Curator Statement Evidence Codes • IC: Inferred by Curator • ND: No biological Data available • Automatically-assigned Evidence Codes • IEA: Inferred from Electronic Annotation

  33. GONUTs (http://gowiki.tamu.edu) • Started as a wiki-based usage guide • Each ontology term is a MW Category • MW supports DAGs as Categories! • Each term page has a notes area for user notes on usage • term pages list examples of genes that were annotated to this term

  34. MOD gene pages • Gene pages from established Model Organism Databases provide examples of best practices

  35. Responding to community needs

  36. User-created gene pages • Annotation pages based on UniProt IDs

  37. Supporting Annotation Jamborees in Cyberspace • RefGenome subgroup of GO Consortium • collaboration on annotation consistency • Electronic Jamborees via teleconference • Uses GONUTS to collect and compare

  38. Supporting Annotation Jamborees in Cyberspace • RefGenome subgroup of GO Consortium • collaboration on annotation consistency • Electronic Jamborees via teleconference • Uses GONUTS to collect and compare

  39. EcoliWiki/GONUTS Team Nathan Liles Brenley McIntosh Debby Siegele Daniel Renfro Anand Venkatraman Adrienne Zweifel GO consortium EcoliHub Team Leaders Barry Wanner PI, Purdue Walid Aref, co-PI, Purdue Tyrell Conway, co-PI, Oklahoma Mike Gribskov, co-PI, Purdue Peter Karp, co-PI, SRI Daisuke Kihara, co-PI, Purdue Funding NIH U24-GM077905 Thanks to URLs: http:ecolihub.org http:ecoliwiki.org http:gowiki.tamu.edu

More Related