360 likes | 693 Views
MicroRNA Detection. Khan Shing CS374 May 8, 2008. Outline. Biological background Gene regulation microRNAs microRNA detection Random forests Comparative genomics microRNA target recognition Site accessibility. Information Flow.
E N D
MicroRNA Detection Khan Shing CS374 May 8, 2008
Source: Science 2 September 2005: Vol. 309. no. 5740, p. 1518
Outline • Biological background • Gene regulation • microRNAs • microRNA detection • Random forests • Comparative genomics • microRNA target recognition • Site accessibility
Information Flow Source: http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology
Gene Regulation • Transcriptional regulation • Enhancers, promoters, transcription factors, epigenetic modifications • Post-transcriptional regulation • mRNA processing, small RNAs • Post-translational regulation • Protein activation, inhibition, degradation
microRNA • RNA can fold like proteins: possess primary, secondary and tertiary structure • Secondary hairpin structure crucial to processing of small RNAs Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
miRNA Processing Source: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of small RNAs. Science 309: 1519–1524.
miRNA Processing Source: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of small RNAs. Science 309: 1519–1524.
miRNAs Suppress Gene Expression Source: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of small RNAs. Science 309: 1519–1524.
microRNA Detection Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
microRNA Detection • Machine learning approach • Find characteristics that distinguish miRNAs • Use these features to train a model • Random forests • Collection of many independently constructed classification trees • Each tree “votes” and the tallied votes yield a score Source: Leo Breiman, Random Forests, Machine Learning, v.45 n.1, p.5-32, October 1 2001.
How to Classify Objects? Source: http://www.gmupolicy.net/its/incidentduration/image351.gif
Node B Node C
Random Forest N cases in training set, M input variables • Sample N cases at random, with replacement, from the original data. This sample will be the training set for growing the tree. • At each node, m variables (m << M) are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. • Each tree is grown to the largest extent possible. There is no pruning. Source: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
Random Forest • Trained on RFAM data set of 60 cloned miRNAs and random negative set (250 putative miRNA hairpins) with a variety of features • Independently construct 500 trees Source: http://www.jfsowa.com/figs/bintree.gif
Comparative Genomics Source: CS262 Lecture 17, Win07, Batzoglou
Structural Features Compare the 60 cloned miRNAs in the RFAM database to random “miRNA like” hairpins (~760,000) Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
Conservation Features Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
Discovery and validation of new miRNAs Alone, each feature does not provide enough discriminatory power, but trained into the model, ~4500 fold enrichment Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
Discovery and validation of new miRNAs • Rank all 760,355 putative miRNAs according to this combined score • Finds 41 novel miRNA candidates • Validate by sequencing and other methods
Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
Results • Antisense strand miRNAs • miRNA* sequences Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
Accurate Prediction of Mature miRNAs Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.
microRNA Target Recognition Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).
Motivation for looking at site accessibility • Existing methods for finding miRNA targets rely mostly on sequence specificity • But miRNAs act as part of a protein complex. They have size and can be blocked by mRNA secondary structure Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).
Proof of Principle Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).
How to use this fact? • Develop an energy based score to rate miRNA-target interactions • Explain ∆G – free energy of molecular interactions • ∆∆G – the difference between free energy gain of the system when an miRNA binds to its target and the free energy loss of unpairing the mRNA target sequence secondary structure. Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).
Test how good ∆∆G is Correlates well with repression in luciferase assays: Even better if flanking regions are included: Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).
Comparison to other target predictors Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).
References Ruby J.G. et al. 2007. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res. doi:10.1101/gr.6597907 E. Berezikov, F. Thuemmler, L.W. van Laake, I. Kondova, R. Bontrop, E. Cuppen and R.H. Plasterk, Diversity of microRNAs in human and chimpanzee brain, Nat. Genet.38 (2006), pp. 1375–1377.
Other figures Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.