110 likes | 291 Views
Summary of. Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick. Complexity of Eukaryotic Genomes. Complexity of genomic data: Transposons Both Strands of DNA may code.
E N D
Summary of Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick
Complexity of Eukaryotic Genomes • Complexity of genomic data: • Transposons • Both Strands of DNA may code
Levels of Genome Annotation Quality Assessment • Base Level: A T C G T A C C C A T G YN N NY Y Y Y Y Y YN • Exon Level: • Whole Gene Level: • Whether all a gene’s exons are properly ID’d and assembled
Impediments to Gene-Finder Quality Assessment • Underlying biology is still poorly understood • cDNA libraries must be very complete—often requires multiple passes to generate a complete library. *Diagram courtesy of University of Miami, http://fig.cox.miami.edu/~cmallery/150/gene/sf16x5.jpg
Impediments to Gene-Finder Quality Assessment, cont’d • Even the most experienced experts make errors • Example: 4 “genes” were found to be untranslated regions • Genome Annotation Software often identifies genes that the experts missed
Approaches to Locating Genomic Features • Comparison to cDNA libraries • Problem: Can only compare to existing libraries; cDNA libraries for target organism probably don’t exist • Highly effective, though • Protein homology (utilizing SwissPROT, BLAT, etc.) • Ineffective overall
Approaches to Locating Genomic Features, cont’d • Hidden Markov Models: • Complex statistical analyses • Assign probabilities to nucleotides having certain functions (exon, intron, promoter, suppressor, etc.); compute probabilities in aggregate to determine functions of specific regions of the genome
Promoters, Repeats • Identifying Promoters: • Site-specific identification (binding sites) • Statistical identification (similar to HMM) • Locate gene and then guess • Repeat Sequences • Must be able to identify even with point mutations, insertions/deletions, etc. • Useful for determining evolutionary significance
And the Winner Is… • Genie EST—most effective overall gene finder; relies on EST (Expressed Sequence Tag) data (somewhat like cDNA data) • Genie—identifies fewer genes, but has fewer false positives
Best Gene Annotation Programs, continued(Table from Reese, et al)
Conclusions • Field is still in infancy • As the exponential amount of genome data continues to grow, genome annotation software will grow in importance. • Researchers will rely on programs like Genie for annotations as quality improves. Illustration courtesy of Genbank, http://www.ncbi.nlm.nih.gov/Genbank/index.html