200 likes | 379 Views
RIP – Transcript Expression Levels. Outline. RNA Immuno-Precipitation (RIP) NGS on RIP & its alternatives Alternate splicing Transcription as a graph Distribution of tags in exons Pipeline on RIP-seq dataset. RNA Immuno-Precipitation (RIP).
E N D
Outline • RNA Immuno-Precipitation (RIP) • NGS on RIP & its alternatives • Alternate splicing • Transcription as a graph • Distribution of tags in exons • Pipeline on RIP-seq dataset
RNA Immuno-Precipitation (RIP) • Global identification of multiple RNA targets of RNA-Binding Proteins (RBPs) • Identify proteins associated with RNAs in RNP complexes • Identify subsets of RNAs that are functionally-related and potentially co-regulated
Sequencing on RIP • RIP-Chip • Noisy • May miss out rare transcripts • RIP-RT-PCR • PCR introduces mutations • RIP tilting-arrays • Very expensive • Too sensitive to ‘transcriptional noise’
NGS on RIP • RIP-Seq • A more complete and unbiased assessment of the global population of RNAs associated with a RNP complex • Minimize sequencing bias and high backgrounds known to the previously-mentioned methods
Alternate Splicing • A simple example • Regions with the numbers of reads • Exon1: chr1:13113087-13113138(5,1); • Exon2: chr1:13113270-13113299(2,0); • Exon3: chr1:13113312-13113343(3,0); • Splice reads • chr1,13113107,13113138,chr1,13113312,13113343,3.0; • chr1,13113087,13113116,chr1,13113270,13113299,2.0; Exon1(5) Exon2(2) Exon3(3) Exon_Num(Tags)
Alternate Splicing • A less ideal example • Regions with the numbers of reads • Exon1: chr4:145149018-145149181(29,0); • Exon2: chr4:145149265-145149402(8,0); • Exon3: chr4:146893298-146895275(116,1); • Splice reads • chr4,145149059,145149088,chr4,146894246,146894276,3.0; • chr4,145149374,145149402,chr4,146894470,146894498,2.0; Exon1(29) Exon2(8) Exon3(116)
Transcription as a Graph • From RNA-seq data, check the overlap of the tags • If a region has more than one tag, we call it an enriched region • Nodes • Using the splice reads, we will connect the enriched regions • Edges
Transcription as a Graph • Represent transcriptome in a topologically sorted acyclic graph • Some Observed Errors (RME005) • Out-of-range edges in graphs • Self-looping nodes • Default action: Ignore them
Distribution of Tags in Exons • rQuant – Courtesy of Regina Bohnert (FML, Tubingen)
RNA-seq RIP-seq • The previous results are from RNA-seq • Will we have similar observations on RIP-seq datasets? • And possibly link the observations to transcription expression levels in transcriptome
Pipeline on RIP-seq dataset • Dataset RME005 is used • Use TopHat / Eland to map RNA back to genome • Generate transcription-graphs for each transcript with alternate splicing • Express the paths of all transcriptions in the graph using a set of linear equations • Use R to solve the linear equations
An example from RME005 • There are two transcripts • Path1: Exon1 -> Exon2 -> Exon4 • Path2: Exon1 -> Exon3 -> Exon4 • Exon1 - Exon4 have length L1 - L4, and have reads with number N1 - N4 • S1-S4 are the numbers of splice reads S3 S1 N3 Exon4 N4 N1 N2 Exon1 Exon2 Exon3 S4 S2
Assumptions • The transcript expression levels are: • Path1: x1 • Path2: x2 • The read length = constant • The reads are uniformly sampled from the transcripts • Use density of reads instead of read_coverage • Differentiate reads on both long & short exons
Equations for linear programming • Objective function: minimize the sum of d_i • Constraints • N1/L1 = x1 + x2 + d1 - d2 • S1/R = x1 + d3 - d4 • N2/L2 = x1 + d5 - d6 • S2/R = x1 + d7 - d8 • S3/R = x2 + d9 - d10 • N3/L3 = x2 + d11 - d12 • S4/R = x2 + d13 - d14 • N4/L4 = x1 + x2 + d15 - d16 • x1 , x2 >= 0 • d_i >= 0 • The solution should be the values of x1, x2 and all d_i S3 S1 N1 N2 N3 N4 S4 S2
Another problem • An implicit assumption on enriched regions in RME005 • RIP is known to be ~10% efficient • Noise will overwhelm true RNP-targets • Should use total-RNA as control dataset • True-positive regions from RIP should be relatively enriched with tags than
Handling the assumption • Obtain RNA-seq from the same source of transcriptome • Directly compare both RNA-seq and RIP-seq data • RIP-chip discriminate enriched region with >4-fold than RNA-chip data • Maybe 4-fold is the magic number ? • Current tag distribution observed by Dr Li Guoliang • Non-uniform as opposed to what rQuant has observed on RNA-seq