120 likes | 257 Views
Splicing Exons: A Eukaryotic Challenge to Gene Prediction. Ian McCoy. Gene Prediction. Genes must be identified to make the genome useful Computational Problem: Take a seemingly random sequence of characters, millions or billions of bases long, and find the genes. A Serious Complication.
E N D
Splicing Exons: A Eukaryotic Challenge to Gene Prediction Ian McCoy
Gene Prediction • Genes must be identified to make the genome useful • Computational Problem: Take a seemingly random sequence of characters, millions or billions of bases long, and find the genes.
A Serious Complication • Only 3% of the human genome contains genes
Similarity-Based Approach • Instead of looking for a gene for a target protein directly, use a protein in a related organism. • Find all local similarities between a genomic sequence and the target protein sequence. • All substrings that exhibit a certain level of similarity will be called putative exons.
Exon-Chaining Problem • Use brute force to generate a set of putative exons. • Represent each exon with three parameters (l,r,w). • Find a maximum set of nonoverlapping putative exons.
Formulate as Graph Problem • Create a graph G with 2n verticies: n vertices are starting(left) positions of exons and n vertices are ending(right) positions of exons. • The set of left and right interval ends is sorted into increasing order. • There are edges between each li and ri of weight wi for I from 1 to n; and 2n-1 additional edges of weight 0 connecting adjacent vertices.
Input: A set of weighted intervals (putative exons) Output: The length of the maximum chain of intervals from this set
Dynamic Programming Algorithm ExonChaining (G, n) //Graph, number of intervals • fori ← 1 to 2n • si← 0 • fori ← 2 to 2n • if vertex vi in G corresponds to right end of the interval I • j← index of vertex for left end of the interval I • w← weight of the interval I • sj← max {sj + w, si-1} • else • si← si-1 • returns2n
Shortcomings • A large number of short exons will decrease the efficacy of our method for finding putative exons. • Exons may be out of order.
Any Questions? • Jones, Neil C., and Pavel A. Pevzner. An Introduction to Bioinformatics Algorithms. Cambridge: MIT Press, 2004. (p.200-203)