1 / 12

“Gene Finding in Novel Genomes” by Ian Korf

“Gene Finding in Novel Genomes” by Ian Korf. Presented by: Christine Lee SoCAL BSI 2004. Outline. Background and Motivation Existing gene finder programs Snap as ab initio high performance gene finder Novel genome gene prediction The Data Genome compositional differences

tahir
Download Presentation

“Gene Finding in Novel Genomes” by Ian Korf

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Gene Finding in Novel Genomes” by Ian Korf Presented by: Christine Lee SoCAL BSI 2004

  2. Outline • Background and Motivation • Existing gene finder programs • Snap as ab initio high performance gene finder • Novel genome gene prediction • The Data • Genome compositional differences • Parameter estimation in novel genomes • Conclusion

  3. Background and motivation • Rapid genome sequencing • Key task: identification of structure of protein-coding region • Ab initio gene prediction • Gene finder • Annotation of novel genome • SNAP & de novo species-specific parameter estimation

  4. Existing gene finder programs • Genscan: performs as well as recent gene finders designed for Arabidopsis • HMMGene and Genefinder: well-established gene prediction programs for C. elegans • Augustus: one of the latest, shown to outperform Genscan, GENIE, and GENEID in Drosophilla

  5. SNAP ab initio performance

  6. Data Set Data set characteristics At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa.

  7. Codon Frequency

  8. Performance of foreign and bootstrapped parameters. The bold face values are determined by 5-fold cross-validation within the same species. At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa. Sensitivity (NSN) and specificity (NSP) are reported at the nucleotide level. The bootstrapped values (bottom part of the table) are derived from parameter estimates based on gene predictions and no actual data. In these experiments, only inter-species gene parameters were used; dashes represent cells that would contain intra-species predictions.

  9. Conclusion and Future Work • Feasibility of gene finder as bootstrap predictor • Improved results through implementation of advanced statistical methods • Apply gene prediction to large genomes

More Related