220 likes | 494 Views
An extensive map of RNA-protein interactions in Drosophila melanogaster Marcus Stoiber, Biostatistics, PhD student, UC Berkeley Gemma May, Mike Duff , Robert Obar, Spyros Artavanis-Tsakonas, Ben Brown, Brenton Graveley , and Susan Celniker. Outline. RIP- seq , Datasets & QC Overview
E N D
An extensive map of RNA-protein interactions in Drosophila melanogaster Marcus Stoiber, Biostatistics, PhD student, UC Berkeley Gemma May, Mike Duff, Robert Obar, Spyros Artavanis-Tsakonas, Ben Brown, Brenton Graveley, and Susan Celniker
Outline • RIP-seq, Datasets & QC Overview • Differential Binding (DB) Analysis Pipeline • Network, Clustering Analysis • RBPs as global post-transcriptional regulators • Hotspot RNAs • RBPs bind the RNA of other RBPs • Clustering RNA Binding Proteins (RBP) • Comparison to Related Studies • RNAi of RBPs • RIP-chip in Yeast • Motif Enrichment
RIP-seq Overview RIP-seq Overview • RNA – Immuno-Precipitation followed by sequencing • Identifies all RNA binding partners for a single RBP • Smaller studies have been carried out on few RBPs, but none have surveyed many RBPs successfully. Spliceosome Novel EJC hnRNP RNA UTR Intron Exon SR Proteins • RBP Functions (post poly-A): • Export • Translational repression/activation • Localization • Signaling
RIP-seq Overview RIP-seq Experiment S2 Cells RNA Transfect with HA-tagged Protein of Interest Lyse Cells Native Proteins Immuno-precipitation ACGUCGAUUAGCUGCUAUGCAUACAGGCUAUACGUAGCUAUACGAUCGAUCAGUCGAUCAUUACGUAGCUAUCAACGUACG………………. Computational Analysis Illumina Sequencing Confirm IP & Elute RNA
Datasets & QC Overview RIP-seq Data & QC • 24 RPBs of interest (in biological duplicate)★: • Spliceosome Core: Cbp20, CG6227, CG6841, Rm62, Smn, snRNP-U1-70K, U2af50 • Exon Junction Complex (EJC): Upf1 • Heterogeneous Nuclear RNP: CG17838, elav, msi, mub, ps • Novel: Fmr1, qkr54B, qkr58E-1 • Other: RpS3, eIF3-S4 • SR Proteins: B52, Rbp1, SC35, SF2, Srp54, tra2 • 4 controls★✚(empty vector with HA-tag) and 4 Non-RBP★✚negative control experiments ★ - Confirmed via sequence adjacent to HA-tag analysis ✚ - Confirmed via leave-one-out DE analysis
Differential Binding (DB) Analysis Pipeline DB Analysis Pipeline Count RNA Totals Locus Level Read Counts Aligned Reads DESeq on each replicate separately with locus dispersion estimation across all samples and controls Irreproducible Discovery Rate IDR value for each RNA – RBP pair P-values for each RNA – RBP replicate • Theory of IDR: • Li, Brown, Huang, Bickel; Measuring Reproducibility of High-Throughput Experiments. • Practice of Using IDR: • Landt, et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.
Network, Clustering Analysis RBP Network : Biological Findings • Hotspot RNAs • RNAs of post-transcriptional regulators • RNAi, NMD, RBPs, splicing • RBPs bind the RNA of other RBPs • Confirms predicted phenomenon in metazoans1 • RBP-specific RNA partners • Characterization of RBPs of unknown function • Guilt by association • Differential Exon Usage Example (lncRNA) 1Kosti, I., Radivojac, P. & Mandel-Gutfreund, Y. An integrated regulatory network reveals pervasive cross-regulation among transcription and splicing factors. PLoSComputBiol8, e1002603, (2012).
Hotspot RNAs RBP Network : 30,000 Foot View ? Bound by 17: Hsp26 Bound by 16: Smg5 Bound by 15: Cdc5, CG12065, CG3008,CG8801, Ranbp9, Rpn10 “Hot-spot” RNAs • GO term enrichment for hot-spot RNAs: • Splicing • NDM • RNAi • Neurogenesis • Protein Folding • *** Indicates a global translational regulation mechanism for RBPs *** *Poisson-Bionomial Distribution assuming none of the 5191 RNAs are actually differentially bound.
Hotspot RNAs Hotspot RNAs Hotspot RNA signal is driven by the strongest binders. Below is a similar plot for a GO term which is significant in only four hnRNPfactors.
RBPs bind the RNA of other RBPs RBPs bind mRNAs of other RBPs &Hotspot RNAs
RBPs bind the RNA of other RBPs RBPs that bind many RNAs tend to have their mRNA bound by many RBPs • Correlation Statistics • Raw Correlation (Pearson): 0.7615207 • P-Value (Permutation Test) ≈ 0.002446714 • Raw Correlation (Spearman): 0.718886 • P-Value (Permutation Test) ≈ 0.002336134 Possible Confounding Factors • Driven by Statistical Power Issues? (Transparent Red Circles) • Partial correlation adjusted for normalized expression: • Pearson - 0.7857327 5.849305e-09 • Spearman - 0.7243166 1.477892e-06 Length Normalized Expression = (Normalized expression / Gene length) * mean(all gene lengths) In order to keep normalizations on same scale. • Driven by Native Biological Expression? (Transparent Blue Circles) • Partial correlation adjusted for • length normalized expression: • Pearson - 0.7911933 3.05612e-09 • Spearman - 0.750386 1.968678e-07
RBP-specific RNA partners RBPs Which Bind Unique RNAs • hnRNPs • Core Proteins • SR Proteins • EJC • hnRNPs are more likely to have unique binding partners. • Wilcox Rank Sum Test P-Value: 0.0001907
Characterization of RBPs of unknown function Functionally related RBPs bind functionally related mRNAs • Define distance between two RBPs as: • Transcripts: Raw Overlap (Jaccard Distance) • GO terms: Weighted Overlap (Cosine Distance) • Dimension reduction by MDS • Identified global coordination and potential classification of novel RBPs: • Fmr1 – Spliceosome Core / SR • qkr54B and qkr58E-1 - hnRNP • Fmr1 – SpliceosomeCore /SR • qkr54B and qkr58E-1 - hnRNP • protein phosphorylation • determination of adult lifespan • long-term memory • locomotor rhythm
Differential Exon Usage Example lncRNA Locus CG33229 CR42862 Negative Controls Srp54 B52
Comparison to Yeast Study Comparison to Yeast RIP-chip1 “RBPs that bind many RNAs tend to have their mRNA bound by many RBPs” appears be a metazoan-specific phenomenon. GO coordination appears to be stronger than transcript coordination between functionally related RBPs. 1Diverse RNA-Binding Proteins Interact with Functionally Related Sets of RNAs, Suggesting an Extensive Regulatory System; Daniel J. Hogan, Daniel P. Riordan, Andre´ P. Gerber, Daniel Herschlag, Patrick O. Brown
Comparison to RNAi Study Correspondence with RNAi • 55 RBPs versus 2 control samples • ~20 overlapping experiments with RIP • In S2 cells • Interpretation would be that RIP hits can either • Directly effect expression of an RNA (RNAi Hit) • Localize / sequester an RNA (Causing Other RNAi Hits) RNAi RIP
Comparison to RNAi Study Correspondence with RNAi
Motif Enrichment Motif Enrichment • Current motif enrichment algorithms do not work in complex transcript space. • Either DNA space or simple (yeast) transcript space • For gene set of interest (Red lines) random “matched” sets (Grey lines) are chosen. • Calculate hyper-geometric p-values for each 7-mer in each random gene set • Plots show rank (x-axis) vs. raw p-value (y-axis) • Correct null would follow line with slope 1. • Clearly this is not a valid null because of k-mer distributions within genes ps elav B52
Motif Enrichment Motif Enrichment • Clearly some samples have more significant hits than others • Use 95% quantile of extreme p-value from each random gene set as cutoff value for enriched motifs in the gene set of interest • 2 of the 3 gene sets of interest identified significant motifs using this method • These motifs match closely the in vitromotifs1for these factors • B52cluster top significant enriched motifs: • GAGGAGG, AGGAGGA, GGAGGAG, AGAAGGA • elav cluster significant enriched motif: • UUUUUUU 1 Ray, Debashish, et al. "A compendium of RNA-binding motifs for decoding gene regulation." Nature 499.7457 (2013): 172-177.
Motif Enrichment Motif Enrichment Plan • Hyper-geometric p-values do not give accurate rank list of enrichment • Instead, count of random samples wherein a motif is found less often than in the sample of interest gives a valid rank list of enrichment • Possibly add filter for low complexity regions prior to this step • Motifs found within the sample of interest much more often than within random samples are clustered. • Use edit distance between motifs • followed by k-means clustering on • n-dimensional MDS projection • Align each set of motifs (clustalw/ω) • and produce a PWM
Future Directions Summary of Findings • “RBPs are global regulators of post-transcriptional machinery” • Bind mRNAs of proteins involved in RNAi, NMD, splicing, protein folding • Bind mRNAs of other RBPs • “RBPs which are master regulators must have their translation regulated by many RBPs” • Appears to be a metazoan-specific phenomenon • hnRNPs tend to bind more specific RNA partners • Characterization of 3 RBPs of previously unknown class • Motif enrichment results matches previous studies
Acknowledgments Acknowledgments modENCODE Consortium LBNL Ben Brown Susan Celniker University of Connecticut Health Center Brenton Graveley Mike Duff Gemma May Harvard Medical School Robert Obar Spyros Artavanis-Tsakonas