250 likes | 422 Views
Where Will They Strike Next? microRNA targeting tactics in the war on gene expression. Jeff Reid Miller “Lab” Baylor College of Medicine. Outline. Introduction to miRNAs The “ ask Bartel ” model for targeting Our proposed model Discuss predictions made by our model
E N D
Where Will They Strike Next?microRNA targeting tactics in the war on gene expression Jeff Reid Miller “Lab” Baylor College of Medicine
Outline • Introduction to miRNAs • The “ask Bartel” model for targeting • Our proposed model • Discuss predictions made by our model • All positions on the miRNA are not equal • A given miRNA’s targets share function • Have a quantitative model that does not suffer from the arbitrariness of ask Bartel
Plant microRNAs • This talk is about plant miRNAs • Animal miRNAs different, more complicated • If you want to know more about them ask Tuan Tran! • What is a microRNA (miRNA)? • ~21nt single-stranded non-coding RNAs • Processed from stem/loop precursors • Bind to mRNA in the cytoplasm • Regulate genes • Often relevant to development
microRNA biogenesis (conventional wisdom) • miRNA gene is transcribed producing primary transcript • pri-miRNA processed by dicer… • ..producing miRNA duplex • duplex moves out of the nucleus • helicase activity unzips duplex • mature miRNA forms RNA-induced silencing complex (RISC) • RISC recognizes a target site • Targeted mRNA is regulated (mRNA cleavage or translational repression) Figure from Bartel, D.P. (2004). Cell 116, 281-297.
mRNA target site A U U C U A G G C A A A U C G C G U G G C G C A C G U U C C G A RISC M= 2 Target “Acquisition” • How does the RISC identify target sites? • Based solely on mature miRNA sequence • Consistent all with known examples • “just” string manipulation • With that in mind, consider a simple model… • Targets have small “mismatch score” – M • Count non-WC pairs in miRNA/target duplex • Score is independent of position 3’ 5’ 3’ 5’
Complementarity Model* • Look for 21-mers (mRNA sequence) with M < 4 • Find targets… • mir172a1 [AP2]: At5g60120(1) At4g36920(2) At2g28550(3) At5g67180(3) At5g12900(3) • …turns out most targets of a given miRNA are in genes which share a common function • There are some ask Bartel elements to the model • M = 4 targets sharing function included case-by-case • Single bulges are sometimes allowed (mir162, mir163) • Model specificity is problematic… APETALA2 transcription factor *Rhoades, et. al. (2002) Cell 110, 513-520.
Selectivity and Specificity • Selectivity (false negatives) • Bartel’s model finds “everything” for M < 5 • Putative targets from this model (most confirmed by experiment) define the target population • Specificity (false positives) • Bartel’s model is problematic • M < 5 includes many false positives • M < 4 and qualitative ask Bartel elements are necessary for model specificity • Our goal is to develop a quantitative model
Position Dependent Model • Ask Bartel has beenspectacularly successful • Build on existing model & make it quantitative • No a priori justification of position-independence • assumed by the ask Bartel model • Extend to a position-dependent mismatch model • Assign mismatch at position i weight bi • For ask Bartel model bi = 1 • Quantify target “strength” with binding probability • pt is the probability of finding the miRNA bound to target site t in the mRNA population
m = miRNA* sequence t = target site sequence b = mismatch parameters A b2 b1 b3 U U b4 C U A G b5 G C A A A U C G mRNA C G U G G C G C A C G U U C C G A RISC Boltzmann factors • Now “mismatch score” is position-dependent • Boltzmann factor gives binding probability • Quantitative model built, but how to find bi? 3’ 5’ 3’ 5’
Model Comparison • Follow DNA binding protein example* • Consider a thought experiment…. • Mix many copies of the genome and N copies of the protein and count the number of examples of protein bound to site t • ft= nt/ N • If the model works ft and pt must agree! • Determine bi by looking for this agreement • Maximize the probability that the data (ft) could have come from the model (pt)… *Brown, C.T., and Callan, C.G. (2004). Proc. Natl. Acad. Sci. 101, 2404.
Model Testing • Probability of data arising from our position dependent mismatch model • Obtain best match of model to data by maximizing the log probability • Yields set of parameters biwhich maximizes the probability of getting the data from our model
b4 b1 b5 b3 b2 p24 f24 Optimization Cartoon • Maximize L to get bi miRNA sequence measured fraction bound UAGCA f1f2f3f4f5 ... f24 Parameter Controls Inputs Binding Probabilities miRNAs 0 data
b4 b1 b5 b3 b2 p24 f24 Optimization Cartoon • Maximize L to get bi miRNA sequence measured fraction bound UAGCA f1f2f3f4f5 ... f24 Parameter Controls Inputs Binding Probabilities miRNAs 0 data
b4 b1 b5 b3 b2 p24 f24 Optimization Cartoon • Maximize L to get bi miRNA sequence measured fraction bound UAGCA f1f2f3f4f5 ... f24 Parameter Controls Inputs Binding Probabilities miRNAs 0 data
Model Testing • Probability of data arising from our position dependent mismatch model • Obtain best match of model to data by maximizing the log probability • Yields set of parameters biwhich maximizes the probability of getting the data from our model
Review • Application of this procedure to miRNAs • Optimize to get best agreement between • position-dependent mismatch model: pg • Ask Bartel complementarity model: fg • Equal binding probability for each training target • Minimal binding to everything else (background) • A contribution we made to the method • necessary to avoid overfitting
Multi-miRNA Optimization • Given the amount of data we have • This method would fail on DNA binding proteins • All miRNAs share the same machinery for target recognition (all form the RISC) • DNA binding protein recognition depends on the each specific protein • Solution to our problem • Simultaneously optimize for several miRNAs
Results - Parameters • Multi-miRNA optimization of nine Arabidopsis miRNAs • 157b, 159b, 160b, 164a, 165b, 167b, 168a, 171, 172a1 • A set of functionally diverse (21-mer) miRNAs bi 3’ 5’ (i)
target 5’ 3’ 5’ 3’ mir162a 14 15 1 21 Position 14 • Mismatch at position 14 • Has no effect on a target’s binding probability! • Surprising and exciting because… • …this position is known to be special • mir162a target • 1g01040 DEAD/DEAH box helicase • Has a bulge at position 14 • This analysis did not include mir162a! • A provocative result…
Results - Targets • Training targets should have low energy • Found by ask Bartel model • Reside in genes which share majority function • Targets in the background have high energy • Background targets with low energy are interesting • We are particularly interested all the majority function targets for a given miRNA • Especially those which are not training targets • Look at distributions of target energies • For each value of M
mir165b -- HD-Zip N(E) training targets majority function majority function not training targets! N(E)
mir159b -- MYB N(E) N(E)
Conclusions • Refined the qualitative complementarity model • A quantitative model which is much less arbitrary • Whatever we get, we get – not “ask Miller” • Majority function targets group together at low energy • Bartel finds most targets, our model finds all targets • Appropriate experiments could falsify our model • How important is position 14? • Look at some specific ask Bartel targets • Advanced technology of optimization • Resolution of the overfitting problem • Simultaneous optimization
Encoding ofNetworks • Networks • miRNA families • A single target mRNA can be regulated by different miRNAs • And a single miRNA can regulate many different mRNAs • Apparently an overlapping and probably redundant regulatory network • Encoding • All this regulation encoded in mere text! • How is this encoded in the sequence? • Why is it encoded in this way?
Acknowledgements • Miller Lab Posse • Jon Miller • Tuan Tran • Will Salerno • Gerald Lim • Curtis Callan (Princeton) • Keck Center for Computational and Structural Biology • BCM Biochemistry Department