500 likes | 1.57k Views
Non-coding RNA. William Liu CS374: Algorithms in Biology November 23, 2004 . Non-Coding RNA. Background Basics Biology Overview Why ncRNA - Central Dogma? Problem Space HMM/sCFG Solution Paper Pair HMMs on Tree Structures Alignment of Trees, Structural Alignment
E N D
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004
Non-Coding RNA • Background Basics • Biology Overview • Why ncRNA - Central Dogma? • Problem Space • HMM/sCFG Solution • Paper • Pair HMMs on Tree Structures • Alignment of Trees, Structural Alignment • Experimental Evaluation • Conclusion
Biology Overview • RNA merely plays an accessory role • Complexity is defined by proteins encoded in the genome
Biology Overview • Non-coding RNA (ncRNA) is a RNA molecule that functions w/o being translated into a protein • Most prominent examples: Transfer RNA (tRNA), Ribosomal RNA (rRNA)
Why Non-coding RNA • Protein-coding genes can’t account for all complexity • ncRNA is important! • Gene regulators Genome Biol. 2002; Beyond The Proteome: Non-coding Regulatory RNAs
Non-coding RNA Problems • Finding ncRNA genes in the genome: locate these genes • Finding Homologs of ncRNA: figure out what they do
Finding ncRNA Genes • Protein Approaches • Statistically biased (codon triplets) • Open Reading Frames • ncRNA Approaches • High CG content (hyperthermophiles) • Promoter/Terminator identification (E. Coli) Comparative Genome Analysis Comparative Genome Analysis
Similarity Searching • Proteins • BLAST, Sequence Alignment (DP) • Genes that code for proteins are conserved across genomes (e.g. low rate of mutation) • ncRNA • Secondary structure usually conserved • Alignment scoring based on structure is imperative
Alignment Approaches • sCFGs: Modeling secondary structure, scoring sequences • HMM for scoring of sequence and secondary structure alignment
Pair HMMs on Tree Structures • Outline • Alignment on Trees • Structural Alignment • Secondary Structure Representation • Hidden Markov Model • Recurrence Relations • Experimental Evaluation • Future Work
e f g a a e h i b b c c f g d d h i Alignment on Trees
A C A A G G A A G C C G A U C G A A A G A U U G C A A G U C Structural Alignment • Problem: Given an RNA sequence with known Secondary Structure and an RNA sequence (unknown structure), obtain the optimal alignment of the two
Structural Representation • Skeletal Tree (, ): Branch Structure (X, , Y): Base-pairs (X, ) or (, Y): Unpaired bases X,Y {A,U,G,C}
Hidden Markov Model • M: Match state, I: Insertion state, D: Deletion state • XY: State transition probability from X to Y X: Initial probability : Emission probability for pair x,y X,Y {M,I,D}
Notation • Let w=a1a2…an be an unfolded RNA sequence of length n • Let w[i] denote ith symbol in w Let w[i,j] denote a substring aiai+1…aj of w
Notation • Let T be a skeletal tree representing a folded RNA sequence (known structure) • Let v(j) denote the label of node j in tree T • Let T[j] denote the subtree rooted at node j in tree T • Let jn denote thenth child of node jin tree T
Structural Alignment • Intuition: Given the ncRNA sequence, b with unknown structure, generate a predicted folded structure for b, align the resulting tree with the ncRNA with known secondary structure a. • Complexity: O(K M N3 ) K = # states in pair HMM, M = size of skeletal tree, N = length of unfolded sequence
Experimental Evaluation • Dynamic Programming to calculate recurrence relations, prototype system to execute algorithm • Experiments on 2 families of RNA: Transfer RNAs and Hammerhead Ribozyme
Parameters Gorodkin et al. (1997)
Future Work • Since based on dynamic programming (of pairwise alignment), many DP techniques can apply • Refine emission probabilities, relate score matrix (reliable alignment for RNA families)
Conclusions • ncRNA space is quite open - no really great techniques yet • How many ncRNA genes are there? • Absence of evidence ≠ evidence of absence • Eddy’s call to arms “it is time for RNA computational biologists to step up”
References • Sakakibara, K., “Pair Hidden Markov Models on Tree Structures”, Bioinformatics, 19:232-240, 2003 • Eddy, S., “Computational Genomics of Noncoding RNA Genes”, Cell, Vol 109:137-140, 2002 • Szymanski, M., Barciszewski, J., “Beyond The Proteome: Non-coding Regulatory RNAs”