120 likes | 226 Views
“Gene Finding in Novel Genomes” by Ian Korf. Presented by: Christine Lee SoCAL BSI 2004. Outline. Background and Motivation Existing gene finder programs Snap as ab initio high performance gene finder Novel genome gene prediction The Data Genome compositional differences
E N D
“Gene Finding in Novel Genomes” by Ian Korf Presented by: Christine Lee SoCAL BSI 2004
Outline • Background and Motivation • Existing gene finder programs • Snap as ab initio high performance gene finder • Novel genome gene prediction • The Data • Genome compositional differences • Parameter estimation in novel genomes • Conclusion
Background and motivation • Rapid genome sequencing • Key task: identification of structure of protein-coding region • Ab initio gene prediction • Gene finder • Annotation of novel genome • SNAP & de novo species-specific parameter estimation
Existing gene finder programs • Genscan: performs as well as recent gene finders designed for Arabidopsis • HMMGene and Genefinder: well-established gene prediction programs for C. elegans • Augustus: one of the latest, shown to outperform Genscan, GENIE, and GENEID in Drosophilla
Data Set Data set characteristics At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa.
Performance of foreign and bootstrapped parameters. The bold face values are determined by 5-fold cross-validation within the same species. At Arabidopsis thaliana, Ce Caenorhabditis elegans, Dm Drosophila melanogaster, Os Oryza sativa. Sensitivity (NSN) and specificity (NSP) are reported at the nucleotide level. The bootstrapped values (bottom part of the table) are derived from parameter estimates based on gene predictions and no actual data. In these experiments, only inter-species gene parameters were used; dashes represent cells that would contain intra-species predictions.
Conclusion and Future Work • Feasibility of gene finder as bootstrap predictor • Improved results through implementation of advanced statistical methods • Apply gene prediction to large genomes