270 likes | 285 Views
This presentation discusses the importance of community involvement in increasing Gene Ontology (GO) annotation, targeting annotations, and improving annotation quality scores.
E N D
Increasing GO Annotation Through Community Involvement Fiona McCarthy*, Nan Wang*, Susan Bridges** and Shane Burgess** http://www.agbase.msstate.edu/ GO Consortium User Meeting, 10 Sept 2006
Who Are We? Dr Shane Burgess Dr Susan Bridges Divyaswetha Peddinti Nan Wang Bryce Magee Teresia Buza
MGI: Judith Blake, David Hill, Mary Dolan GOA: Evelyn Camon, Dan Barrell GO Editorial Office: Jennifer Clark, Midori Harris dictyBase: Rex Chisholm, Eric Just GO Consortium Member mentor
Overview • From genes to function • How much GO do I need? • Getting more GO annotation: targeted GO annotation and community involvement • Targeting GO annotations • GO annotation quality scores • communty directed annotation • Community annotation • The hook: what’s in it for me?
"Today’s challenge is to realise greater knowledge and understanding from the data-rich opportunities provided by modern high-throughput genomic technology."- Professor Andrew Cossins Consortium for Post-Genome Science
Use GO for……. • Grouping gene products by biological function • Determining which classes of gene products are over-represented or under-represented • Focusing on particular biological pathways and functions (hypothesis-driven data interrogation) • Relating a protein’s location to its function
(PubMed 06/09/06) The number of publications using GO is increasing exponentially. Use of GO is increasing in species for which there is a dedicated GO annotation effort.
How Much GO Do I Need? Compiled 15 June 06 using GOProfiler.
How Much GO Do I Need? Compiled 15 June 06 using GOProfiler and PubMed.
Analyzing Microrarray Data: • ‘breadth’ of annotation • how many gene products have GO annotation? • for each gene product, how many annotations? • ‘depth’ of annotation • how detailed is the GO annotation? • How can we effectively use our resources to get the best GO annotation?
Getting more GO annotation • electronic GO annotations (IEA) • get many annotations quickly but lack detail (get ‘breadth’ but lack ‘depth’) • manual GO annotations • literature curation (gold standard) • Slower • many, many more researchers publishing papers than biocurators reading them • cleverer GO annotation: • targeted GO annotation • community involvement: researchers are the ‘local’ experts
Targeting GO Annotation • target gene products with poor GO annotation • determining GO annotation quality • target gene products most interesting for the community • community feedback & prioritization
GO Annotation Quality (GAQ) For a single gene product: GAQ = no. annotations x dag depth x evidence code • calculate overall GAQ score for an organism • calculate GAQ for functional subsets of gene products • target GO annotation efforts to genes in functional subsets with low GAQ score
Comparative GAQ: Scores for Chicken and Mouse
Comparative GAQ: Scores for Chicken Cell Processes
Community Directed Annotation • AgBase web form to enable community requests • prioritization: • requests prioritized for each species • one gene product request equals one count • gene products with the most counts have higher priority • annotation time for each species is split proportionally based on the number of requests for each species • requests annotated using both ISS and available published literature • higher priority for gene products submitted to the community gene association file • priority for gene products commonly represented on arrays • prioritization modified based on community input
Community Annotation • login allows annotations to be acknowledged and notification when the request is completed • users may request annotation of a gene product/PubMed reference • submitters assist in pertinant literature slection for curation • allows annotators to focus on GO annotations most important for the community • submitters notified when the request is completed • advanced submission allows trained users to upload gene association files for inclusion in the Community annotation file • quality checks before annotations are transferred from the Community file to the GOC file
Targeting GO Annotation • use GAQ to target poorly annotated processes • determining GO annotation quality • use community requests to prioritize annotations BUT still many, many more researchers publishing papers than biocurators reading them… …and researchers need their data analyzed yesterday...
Community GO Annotation AgBase provides 2 annotation files: • “GO Consortium” file: fully quality checked annotations that meet current GO Consortium guidelines • “Community” file: • annotations for ‘predicted proteins’ without UniProtKB identifiers (until 10 July 2006, were not supported by EBI-GOA) • ISS annotations to evidence codes no longer accepted (as of April 2006) • annotations from community researchers that have not been fully quality checked by a trained GO curator • these annotations will eventually be transferred to the GOC file or (for ISS) superseded by higher quality literature annotations
AgBase Annotation Files GOC file = 1,508 GO associations Community file = 5,146 GO associations
What’s in it for ME? • two tier annotation file systems provides most comprehensive annotation in instances where there are few annotations available • researchers can choose which annotation files best suits their system • researchers with GO training may submit (& be acknowledged for) their own annotations • researchers with specific knowledge of particular gene products can add to the annotation of their gene products of interest via the AgBase request page