360 likes | 552 Views
RNA Folding. RNA Folding Algorithms. Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four possibilities for S(i,...,j) i,j are paired, added to S(i+1,...,j-1) i is unpaired, added to S(i+1,...,j)
E N D
RNA Folding Algorithms • Intuitively: given a sequence, find the structure with the maximal number of base pairs • For nested structures, four possibilities for S(i,...,j) • i,j are paired, added to S(i+1,...,j-1) • i is unpaired, added to S(i+1,...,j) • j is unpaired, added so S(i,...,j-1) • i,j are paired but not to each other, to S(i,...,k), S(k+1,...,j)
RNA Folding by DP • Fill in a matrix of S(0,...,seq_length)
RNA Folding Assumptions • RNA folding algorithms typically detect only nested structures and do not recognize pseudoknots • Some folding algorithms identify pseudoknots but they are typically inefficient or limited (e.g., do not take stacking-dependent pairing models) • Current algorithms get about 50-70% of the base pairs correct, on average
miRISC Ago1 MicroRNAs: Introduction miRNAs are genomically encoded small RNAs processed into single stranded 21-23 mers incorporated into RNP complex (miRISC) miRISC binds to 3’UTRs, repression of translation modest mRNA degradation Bartel, Cell 116, 2004
MicroRNA Transcription • miRNA genes can be in intergenic and intronic regions • miRNA genes can be clustered and co-expressed • Estimates: 60% singletons, 25% introns, 15% clusters
MicroRNA Gene Conservation • Some miRNAs are highly conserved (e.g. let-7) • Conservation must preserve a dsRNA hairpin from which the miRNA is processed by Dicer
MicroRNA Gene Identification • MicroRNA Cloning • Map cloned ~22nt small RNAs to the genome • Predict pre-miRNA secondary structures using m-fold • Score pre-miRNAs based on known miRNA precursors • Computational Identification • Identify conserved genomic segments • Predict pre-miRNA secondary structures using m-fold • Scoring pre-miRNAs based on the known miRNA precursors
MirScan, MirSeeker, … MicroRNA Gene Identification • More complex methods: additional features
MiRBase • ~4500 miRNAs in 41 eukaryotes • Examples: 474 human, 78 fly • Eight viruses express microRNAs
MicroRNAs: Open Questions • Promoter • Transcritpional start site • Transcriptional Termination • Transcriptional complex • Regulation of miRNA expression
Existing algorithms seed focus on quality of the sequence match between miRNA and mRNA target introduce various filters, e.g. evolutionary conservation miRNA 3’ 5’ 5’ 3’ mRNA seed 87654321 miRNA 3’ 5’ wt mRNA 3’ 5’ 987654321 Brennecke et al. 05 The Target Prediction Problem • Target sites show imperfect sequence complementarity: • Strong match in 5’ region (‘seed’) • Varying complementarity on 3’ end • Computational target predictions: • Sensitive to exact pairing rules • ~100 targets per miRNA within fly transcriptome • ~25% of transcriptome under miRNA regulation
miRanda • Target prediction: sequence-based rules • miRNA-target complementarity (strong in 5’, weaker in 3’) • Refinement with binding free energy scores • Use conservation to increase signal to noise
miRNA mRNA Perfect nucleus Imperfect nucleus PicTAR: Combinatorial Targets Filter - over 33% of mature miRNA binding energy to perfect complementary site
Anchor PicTAR: Combinatorial Targets
1…m miRNAs Hidden states b Prior (transition) probabilities p2 p3 pm p0 p1 . . . Emission probabilities GGCAUUAC ACUGUAC A C U G ACUGUAC U C GGCAUUAC ACUGCAC . . . PicTAR: Combinatorial Targets 0.2 0.8 0.3 0.02 0.8 Generated mRNA • - Independency of binding sites (no overlapping) • Transition does not depend on current state (memoryless) • Competition between background and miRNA
miRISC Accessibility: The Missing Component What about target accessibility? miRISC vs.
Experimental Method Drosophila tissue culture cells (S2) No miRNA overexpression establish miRNA expression profile use endogenous miRNA (50-500 copies per cell) (bantam, miR-2 family, miR-184) Dual luciferase reporter assay Renilla experiment, firefly as internal control mild overexpression of target sequence (<10fold) no target degradation (20h transfection) 3’UTR Renilla firefly sensitive, quantitative, linear assay UTR engineering mutate target site sequence mutate sequence surrounding the target site to alter mRNA secondary structure
The Role of Secondary Structure target site 5’ end 0.4 0.3 ~200 b AAAAA normalized luciferase ratio 0.2 3’UTR target site 0.1 N: ~200 bp fragment, native structure C: ~200 bp fragment, closed structure 0.0 3’UTR N C C3 C3+ C5 C5+ rpr (miR-2) A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’ C ACC Target miRNA
Target Accessibility Matters target site 5’ end 0.4 0.3 normalized luciferase ratio 0.2 0.1 0.0 3’UTR N C C3 C3+ C5 C5+ N C 3’UTR N C 3’UTR rpr (miR-2) hid (bantam) grim (miR-2) A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’ C ACC C AAUUAGUUUUCA AAUGAUCUCG UUAGUCGAAAGU UUACUAGAGU U A GCA U GCUC AUCAAAGC UUGUGAU CGAG UAGUUUCG GACACUA ACC U Target miRNA
Accessibility as Important as Sequence D5+3 D5 target site mutations A G C C A GA CUCAUCAAAGC UUGUGAUA target site 5’ end 87654321 0.7 0.4 0.3 normalized luciferase ratio 0.2 0.1 0.0 3’UTR N C C3 C3+ C5 C5+ M2 M3 M6 I5 D5 D5+3 rpr (miR-2) A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’ C ACC Target miRNA
Thermodynamic miRNA::RNA Model UTR ∆G3 = -10.2 ∆G5 = -15.1 ∆G = -25.3
∆G1= -19.5 Thermodynamic miRNA::RNA Model folding area = target +70bp CDS UTR Poly(A) ∆G0= -28.3 ∆Gopen = ∆G0 - ∆G1
0.4 DDG with flank 17 up, 13 down exploring flank size r 0.3 0 0.76 5 0.2 0.74 10 upstream (bp) 0.72 0.1 15 0.70 r=0.77 p<3x10-5 20 0.68 25 0 5 10 15 20 25 -30 -20 -10 0 10 20 30 downstream (bp) ddG Predicts Measured Repression 22 constructs altering accessibility of target sites in rpr, hid, grim rpr DDG DGduplex 0.4 0.4 grim hid 0.3 0.3 normalized luciferase ratio 0.2 0.2 0.1 0.1 r=0.7 p<4x10-4 r=0.36 p<0.11 -30 -28 -26 -24 -22 -30 -20 -10 0 10 20 30
Native Target Analysis seed • 12 miR-184 targets with weaker 3’ pairing, • tested in different backgrounds to alter secondary structure • non-redundant set of 190 experimentally tested miRNA:mRNA target pairs in Drosophila miRNA 3’ 5’ mRNA 5’ 3’ 987654321 miR-184 targets 190 validated targets measured repression differential r=0.87 ddG differential
Genome-Wide Target Analysis miRNA target seeds favor highly accessible regions of the genome DGopen overrepresentation vs. random fly human accessibility (DGopen) accessibility (DGopen)
Assignment • Download the set of human microRNAs • Download the set of human UTRs • Download the mFold software • For each microRNA, identify the set of targets on each UTR, defined by a perfect match to the microRNA seed, bases 2-8 • Partition the targets of each microRNA into conserved and non-conserved targets (define a conservation cutoff) • Compare the RNA-accessibility of conserved and non-conserved targets for each microRNA • For each putative target, extract the 100 bases that surround it • Use mFold to compute the free energy of these 100 bases • Create a dot-plot with points being microRNAs, and axes being the median (plot #1) or mean (plot #2) free energy of all conserved (x-axis) or non-conserved (y-axis) targets of the microRNA