1 / 31

Non-coding RNA

Non-coding RNA. William Liu CS374: Algorithms in Biology November 23, 2004 . Non-Coding RNA. Background Basics Biology Overview Why ncRNA - Central Dogma? Problem Space HMM/sCFG Solution Paper Pair HMMs on Tree Structures Alignment of Trees, Structural Alignment

tyne
Download Presentation

Non-coding RNA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004

  2. Non-Coding RNA • Background Basics • Biology Overview • Why ncRNA - Central Dogma? • Problem Space • HMM/sCFG Solution • Paper • Pair HMMs on Tree Structures • Alignment of Trees, Structural Alignment • Experimental Evaluation • Conclusion

  3. Central Dogma of Molec. Bio.

  4. Biology Overview • RNA merely plays an accessory role • Complexity is defined by proteins encoded in the genome

  5. Biology Overview • Non-coding RNA (ncRNA) is a RNA molecule that functions w/o being translated into a protein • Most prominent examples: Transfer RNA (tRNA), Ribosomal RNA (rRNA)

  6. Why Non-coding RNA • Protein-coding genes can’t account for all complexity • ncRNA is important! • Gene regulators Genome Biol. 2002; Beyond The Proteome: Non-coding Regulatory RNAs

  7. Non-coding RNA Problems • Finding ncRNA genes in the genome: locate these genes • Finding Homologs of ncRNA: figure out what they do

  8. Finding ncRNA Genes • Protein Approaches • Statistically biased (codon triplets) • Open Reading Frames • ncRNA Approaches • High CG content (hyperthermophiles) • Promoter/Terminator identification (E. Coli) Comparative Genome Analysis Comparative Genome Analysis

  9. Genetic Code

  10. Similarity Searching • Proteins • BLAST, Sequence Alignment (DP) • Genes that code for proteins are conserved across genomes (e.g. low rate of mutation) • ncRNA • Secondary structure usually conserved • Alignment scoring based on structure is imperative

  11. ncRNA: Sequence vs Structure

  12. Alignment Approaches • sCFGs: Modeling secondary structure, scoring sequences • HMM for scoring of sequence and secondary structure alignment

  13. Pair HMMs on Tree Structures • Outline • Alignment on Trees • Structural Alignment • Secondary Structure Representation • Hidden Markov Model • Recurrence Relations • Experimental Evaluation • Future Work

  14. e f g a a e  h i b b c c f g d d      h i Alignment on Trees

  15. A C A A G G A A G C C G A U C G A A A G A U U G C A A G U C Structural Alignment • Problem: Given an RNA sequence with known Secondary Structure and an RNA sequence (unknown structure), obtain the optimal alignment of the two

  16. Structural Representation • Skeletal Tree (, ): Branch Structure (X, , Y): Base-pairs (X, ) or (, Y): Unpaired bases X,Y {A,U,G,C}

  17. Hidden Markov Model • M: Match state, I: Insertion state, D: Deletion state • XY: State transition probability from X to Y X: Initial probability : Emission probability for pair x,y X,Y  {M,I,D}

  18. Notation • Let w=a1a2…an be an unfolded RNA sequence of length n • Let w[i] denote ith symbol in w Let w[i,j] denote a substring aiai+1…aj of w

  19. Notation • Let T be a skeletal tree representing a folded RNA sequence (known structure) • Let v(j) denote the label of node j in tree T • Let T[j] denote the subtree rooted at node j in tree T • Let jn denote thenth child of node jin tree T

  20. Recurrence Relation (Match)

  21. Recurrence Relation (Delete)

  22. Recurrence Relation (Insert)

  23. Structural Alignment • Intuition: Given the ncRNA sequence, b with unknown structure, generate a predicted folded structure for b, align the resulting tree with the ncRNA with known secondary structure a. • Complexity: O(K M N3 ) K = # states in pair HMM, M = size of skeletal tree, N = length of unfolded sequence

  24. Experimental Evaluation • Dynamic Programming to calculate recurrence relations, prototype system to execute algorithm • Experiments on 2 families of RNA: Transfer RNAs and Hammerhead Ribozyme

  25. Parameters Gorodkin et al. (1997)

  26. Results: tRNA

  27. Results: Hammerhead Ribozyme

  28. Future Work • Since based on dynamic programming (of pairwise alignment), many DP techniques can apply • Refine emission probabilities, relate score matrix (reliable alignment for RNA families)

  29. Conclusions • ncRNA space is quite open - no really great techniques yet • How many ncRNA genes are there? • Absence of evidence ≠ evidence of absence • Eddy’s call to arms “it is time for RNA computational biologists to step up”

  30. Thanks!

  31. References • Sakakibara, K., “Pair Hidden Markov Models on Tree Structures”, Bioinformatics, 19:232-240, 2003 • Eddy, S., “Computational Genomics of Noncoding RNA Genes”, Cell, Vol 109:137-140, 2002 • Szymanski, M., Barciszewski, J., “Beyond The Proteome: Non-coding Regulatory RNAs”

More Related