150 likes | 292 Views
Biases in RNA- Seq data . Transcript length bias. Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both transcripts is doubled in a treatment sample. The biological variance is the same for both transcripts.
E N D
Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both transcripts is doubled in a treatment sample. The biological variance is the same for both transcripts. They have the same level of differential expression. The transcripts are fragmented into short reads of 10 bases, and reported by the RNA-Seq experiment. There will be more hits to the 100 base transcript – its n will be larger, so it will be reported as more significantly changed.
Random priming aims to sample transcripts uniformly, rather than from just one end (such as with the oligodT primer ……)
Counts of reads along gene Apoein different tissues of the Wold data. (a) brain, (b) liver, (c) skeletal muscle. Each vertical line stands for the count of reads starting at that position. The grey lines are counts in the UTR regions and a further 100 bp. Here introns are deleted and exons are connected into a single piece. Li et al. 2010, Genome Biology, 11, R50
Nucleotide frequencies versus position for stringently mapped reads. For each experiment, mapped reads were extended upstream of the 5′-start position, such that the first position of the actual read is 1 and positions 0 to −20 are obtained from the genome. The first hexamer of the read is shaded. Brief experimental protocols are indicated in the key Biases are caused by hexamer priming that is not random Hansen et al. Nucleic Acids Research, 2010, 38, e31
Human experiment (SRA012427) Yeast experiment (SRA020818_RH) GC content biases some RNA-Seq experiments, but not at the same level in all experiments. Roberts et al. 2011, Genome Biology, 12, R22
Next-generation sequencing is rapidly evolving. There is no market leader, and there have been only a relatively small number of published studies of RNA-Seq for even the most popular NGS platforms. There are clearly biases in the data, and the protocols and chemistry used to generate the data leaves signatures. It is hard to perform meta-analysis. AffymetrixGeneChips are the dominant platform for microarray observations, and have been so for almost a decade – there are more than one hundred thousand hybridizations in the public domain. There has only been a handful of standardised protocols used. This huge dataset allows sensitive meta-analysis.
Helicos 1 year Helicos since 2007