380 likes | 535 Views
CACAO Literature-based Functional Annotation as an Intercollegiate Competition. ASM/JGI Functional Genomics Workshop 2011 Jim Hu Texas A&M PortEco/EcoliWiki. Objectives. Our goals for this unit are to train you to be able to:
E N D
CACAOLiterature-based Functional Annotation as an Intercollegiate Competition ASM/JGI Functional Genomics Workshop 2011 Jim Hu Texas A&M PortEco/EcoliWiki ASM/JGI Functional Genomics 2011 slide 1
Objectives • Our goals for this unit are to train you to be able to: • make functional annotations to the Gene Ontology (GO) based on the literature. • teach your students to do GO annotation with your supervision • participate in the international Community Assessment of Community Annotation with Ontologies (CACAO) competitions. • After this unit, participants should • be able to distinguish different levels of annotation • be able to explain how Gene Ontology (GO) represents gene function, and why it is valuable • make annotations to GO, finding terms with online term browsers (GONUTS, Amigo) • be able to describe how CACAO couples GO annotation with undergraduate education • be able to incorporate GO into any papers that come from future functional genomics experiments. ASM/JGI Functional Genomics 2011 slide 2
Additional materials http://gowiki.tamu.edu/wiki/index.php/ASM-JGI_2011 Includes: • This presentation • An extended unit covering similar material in more detail ASM/JGI Functional Genomics 2011 slide 3
What is annotation? L. Stein (2001) Nature Reviews Genetics 2:493 ASM/JGI Functional Genomics 2011 slide 4
Classic MODel Literature Database Curators (rate limiting) Datasets ASM/JGI Functional Genomics 2011 slide 5
Classic MODel is Expensive ASM/JGI Functional Genomics 2011 slide 6
Community Annotation with Students • Goal: Recruit more community participation • Problem: Incentives to participate are weak • Approach: Couple GO* annotation to teaching • Do this as a competition • Teams • Students get points for annotations • Students can take points for correcting each other • Different scales: within a course, within a campus, between schools *GO = Gene Ontology ASM/JGI Functional Genomics 2011 slide 7
What are Ontologies and why use them? • What? • Controlled vocabulary • Relationships • Why? • Standardization • facilitate comparison across systems • facilitate computer based reasoning systems • Good for data mining! ASM/JGI Functional Genomics 2011 slide 8
is_a part_of GO = Gene Ontology • 3 ontologies for gene products • Biological Process • Molecular Function • Cellular Component • Used to make annotations • aka Gene associations • Term + qualifiers + evidence code + reference etc. figure from GO consortium presentations ASM/JGI Functional Genomics 2011 slide 9
Cellular Component • where a gene product acts ASM/JGI Functional Genomics 2011 slide 10
Molecular Function • activities or “jobs” of a gene product glucose-6-phosphate isomerase activity figure from GO consortium presentations ASM/JGI Functional Genomics 2011 slide 11
Biological Process a commonly recognized series of events cell division Figure from Nature Reviews Microbiology 6, 28-40 (January 2008) ASM/JGI Functional Genomics 2011 slide 12
Set up GONUTS accounts • Go to http://gowiki.tamu.edu • Log in as Demo (a user that cannot edit, but that can create accounts) • Click on log in in the upper right corner • Username: Demo • Password: • Create an account for yourself • Click Login/Create Account on the left sidebar • Click Create Account • Enter your information • Log in as yourself ASM/JGI Functional Genomics 2011 slide 13
GONUTS • wiki for Gene Ontology (unofficial) • Kinds of pages: • GO terms (Categories) • Gene products (proteins) from UniProt • where the annotations go • Publications from PubMed • Misc. other ASM/JGI Functional Genomics 2011 slide 14
GONUTS GO term browsing demo • Enter "anhydrase" in the search box, • click Search • Restrict the search to Category pages • All GO terms are categories • Go to the term page and review the sections • Edit the usage notes ASM/JGI Functional Genomics 2011 slide 15
Annotation on GONUTS • First we need a gene page to hold the annotations • Users can create gene pages for anything in UniProt. • New gene pages are populated with information, including previous GO annotations. ASM/JGI Functional Genomics 2011 slide 16
Key elements of a GO annotation Submitted to GO consortium Viewable on GONUTS ASM/JGI Functional Genomics 2011 slide 17
Community Assessment of Community Annotation with Ontologies Teams of students curate Faculty supervision Support from our team Intramural or Intercollegiate competition Distributed annotation jamborees Assessment via surveys and wiki data-mining CACAOcoupling annotation to teaching credit ASM/JGI Functional Genomics 2011 slide 18
CACAO is competitive • Teams get points for complete annotations • GO term (right level of specificity) • reference • evidence code • identify where in the paper the evidence comes from • Teams can take away points from competitors by challenging annotations • finding a problem • suggesting a better alternative ASM/JGI Functional Genomics 2011 slide 19
Tracking the players • An extension tag added to a user page identifies all the annotations made by that user • Exercise: Edit your user page to add <myAnnotations/> ASM/JGI Functional Genomics 2011 slide 20
Tracking the teams • Team members are assigned to a wiki category • An extension identifies all the annotations made by team members ASM/JGI Functional Genomics 2011 slide 21
Tracking the teams • Team members are assigned to a wiki category • An extension identifies all the annotations made by team members ASM/JGI Functional Genomics 2011 slide 22
Submitting challenges • Clicking submit challenge brings up a challenge form ASM/JGI Functional Genomics 2011 slide 23
Responding to challenges • Pending challenges are shown in the Team pages ASM/JGI Functional Genomics 2011 slide 24
Responding to challenges • A form records responses to challenges ASM/JGI Functional Genomics 2011 slide 25
Overall scoreboard • A scoreboard page gathers information about all teams and challenges ASM/JGI Functional Genomics 2011 slide 26
Overall scoreboard • A scoreboard page gathers information about all teams and challenges ASM/JGI Functional Genomics 2011 slide 27
Judgement • Mentors with curator experience judge the challenges/rebuttals ASM/JGI Functional Genomics 2011 slide 28
History • We just completed our 3rd semester of CACAO • EcoliWiki GO annotation by our staff would be ~250-400/semester ASM/JGI Functional Genomics 2011 slide 29
Evolution of CACAO • What's changed since the first cycle? • Multiple rounds • More time for challenges • Development of our online scoreboard system • Changes in allowed evidence types • no IPI, EXP • More documentation • Rules tweaks as we learn how students will game the system • Outreach: In the Spring 2011 round we had instructors who are new to GO ASM/JGI Functional Genomics 2011 slide 30
Spring 2011 • Training • Annotation • 4 rounds + "World series" • 1 week annotation • 1 week challenges + rebuttals • Judgement and posting of scores • Some did a subset of the rounds (18-25 students in rounds 1-4) • All did the World Series (105 students) • Assessment • Every annotation was reviewed by us • Parallel training of a grad student in judging ASM/JGI Functional Genomics 2011 slide 31
Spring 2011 • PortEco/EcoliWiki provided • web infrastructure via GONUTS • training and guest lectures via Skype • handouts • powerpoints • instructor manuals • coaching students via email, Skype, Google chat • online surveys for assessment • We restricted the kinds of annotations that would be scored • We did not restrict what genes/organisms to use • GONUTs only allows what's in UniProt • We promoted certain areas • Provided reviews • Predictions from computational methods ASM/JGI Functional Genomics 2011 slide 32
Results All rounds ASM/JGI Functional Genomics 2011 slide 33
By organism ASM/JGI Functional Genomics 2011 slide 34
How we view the results • Overall, we think CACAO works • Lots of annotations • The students love it • Quality remains a challenge, but • quality seems correlated with experience • QC is relatively fast ASM/JGI Functional Genomics 2011 slide 35
Plans and Challenges • Adjust the system to promote better annotations and challenges • Improve the scoreboard/tracking system • More flexible • Improve UI • Data mining needs improvement • More documentation • Analyze common errors • Improve assessment • we want to do serious assessment, which means human subjects and IRBs. We need a collaborator for this • Outreach • We want more participants, but can we handle them? ASM/JGI Functional Genomics 2011 slide 36
TAMU Brenley McIntosh Adrienne Zweifel Mahitha Rajendran Daniel Renfro Debby Siegele UCL Ruth Lovering Varsha Khodiyar Miami (Ohio) Iddo Friedberg Univ. of N. Texas Lee Hughes Michigan State Rob Britton Penn State Sarah Ades People ASM/JGI Functional Genomics 2011 slide 37
CACAO Supplemental slides (not shown in the meeting) ASM/JGI Functional Genomics 2011 slide 38