1 / 1

different types: ab initio - bases predictions on statistic profile calculated from a

different types: ab initio - bases predictions on statistic profile calculated from a training set (criteria: consensus sequence start sites, splice junctions, sequence composition on codon and DNA level for coding, introns and non-coding, intron length distribution, exon length distribution)

Download Presentation

different types: ab initio - bases predictions on statistic profile calculated from a

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. different types: ab initio - bases predictions on statistic profile calculated from a training set (criteria: consensus sequence start sites, splice junctions, sequence composition on codon and DNA level for coding, introns and non-coding, intron length distribution, exon length distribution) comparative - bases predictions on sequence similarity to coding in related organism and uses statistic profile from training set to a much lesser extent Exploring different gene finding tools for p. knowlesi originated from the complex and slow process of manually building a training set for unannotated organism making use of an annotated relative (P. falciarum) a variety of in use genefinding tools for p. knowlesi snap: ab initio gene modeller: comparative, sensitive blast, then tries to find start/ stop/ splice site near BLAST hit ends; needs refinement projector: comparative gives us a good opportunity to evaluate strengths and weaknesses of each trial on an ordered contig set for knowlesi chr6 which had been annotated Projector precise alignment step of algorithm means that it needs much memory it cannot go through an entire sequence before we can feed it the reference and query sequence we need to: align the corresponding chromosome contigs (Ellen's script) identify which gene plus surrounding sequence in annotated corresponds to which section in unannotated (Ellen's script and gene modeller can provide some hints for this) take the two linked regions in unannotated and reference and give these to projector as input it can only predict for regions for which you have told it to at the moment it can only be run by the person who wrote it but it is being callibrated for us can show where it observed conservation on sequence level for both (for untranslated, exon and intron) How do they compare to annotated knowlesi ? snap finds the most (154) genes from the reference set (although ab initio trained on manual annotation) then gene modeller (143) then projector (128) gene modeller is closest to manual in terms of how many introns it predicts, projector least close projector is best at getting stops right, snap is best at getting starts right, projector is best at getting all coordinates right Future New runs on latest contig set using projector with exended functionality - allows you to lock regions as coding on the basis of independent evidence such as homology to entries in protein databases, EST hits

More Related