660 likes | 2k Views
First Thesis Advisor Committee Meeting. Alejandro Reyes TAC members. DEXSeq, detecting differential usage of exons using RNA-seq. Exon usage and RNA-seq. Exon usage allows expansion of the eukaryotic proteome: Alternative first exons ( promoter )
E N D
First Thesis Advisor Committee Meeting Alejandro Reyes TAC members
DEXSeq, detecting differential usage of exons using RNA-seq
Exon usage and RNA-seq Exon usage allows expansion of the eukaryotic proteome: Alternative first exons ( promoter ) Alternative last exon ( termination signals ) Alternative splicing RNA-seq allows to study transcriptomes. RNA-seq: lack of methods that estimate properly biological variation. 3
pasilla Drosophila melanogaster S2 cell cultures: - siRNA splicing factor pasilla (NOVA) 3 biological replicates (1X single end, 2X paired end) - control (no treatment) 4 biological replicates (2X single end, 2X paired end) 4
RNA-seq Wang, Gerstein, Snyder. (2009). 5
Exon usage (Brooks et al data) Changes should be reflected in the expression of the exon, independently of the expression of the gene. 6
Exon counting bins Anders, Reyes and Huber. In press.. 7
Counting rules Alignment vs genome Uniquely aligned reads
Model counts of exon l of gene i in sample j
Size Factor counts of exon l of gene i in sample j size factor • Where i is a counting bin • Normalization factor for sequencing depth (Anders and Huber, 2010).
Dispersion counts of exon l of gene i in sample j dispersion size factor • Distinguish biological variability within groups. • Mean variance dependency • Then one can infer changes due to the difference in condition
Dispersion counts of exon l of gene i in sample j dispersion size factor • Poisson distribution with mean q and variance q q is computed • Biological sample with mean u and variance v v is estimated from the data • Negative binomial distribution with mean u and variance v + q
Dispersion counts of exon l of gene i in sample j dispersion size factor • Standard maximum-likelihood estimates has strong bias when the number of samples is small. • Cox-Reid conditional maximum likelihood estimation (Robinson, McCarthy, Smyth. 2010).
Dispersion counts of exon l of gene i in sample j dispersion size factor • Cox-Reid dispersion estimate Anders, Reyes and Huber. In press.
Dispersion counts of exon l of gene i in sample j dispersion size factor • Cox-Reid dispersion estimate • Gamma mean-variance fit • max( fitted value, CR est) Anders, Reyes and Huber. In press.
Generalized linear model counts of exon l of gene i in sample j dispersion size factor change in expression of gene i caused by the treatment pj change in the fraction of reads in exon l due to the treatment pj expression strength of gene i in control fraction of reads in exon l in control
Brooks et al dataset 253 exons in 159 genes (flybase annotation) 10%FDR Anders, Reyes and Huber. In press.
Conclusions I We have developed a R/Bioconductor package to test for differential exon usage: Correct biological variation estimation Flexible statistical test, allowing covariates (GLMs NB) Parallelization Visualization HTML reports 2 HTSeq python preprocessing scripts defining exon bins counting reads 600 monthly downloads, weekly mentions in blogs and mailing lists, cited in two high impact publications, course material in bioinformatic courses, accepted for publication in Genome Research!
Perspectives Improve the html report generator (javascript) Use information from split junction reads
Spliceosome assembly U2 U2AF U1 GU A YAG U4 U6 U5 hnRNP U1 GU B complex U4 U6 U2 U5 SR proteins YAG A kinases and phosphatases U1 U4 RNA helicases Cyclophilins GU C complex U6 U2 U5 YAG A A complex + ~200 non-snRNP proteins Modified from EURASNET
Evolutionary consequences Alternative splicing could be a flexible feature for adaptation of species, easily generate and test new transcripts (Black, 2003). Alternatively spliced exons have been associated with fast evolution rates and frequently gained/lost. (Ermakova, 2006). Mutations easily generate (exonization of introns) or delete splice signals (loose exons). (Alekseyenko, 2007) Differences between closely related species (Blekhman et al, 2010), between the same species (Lalonde et al, 2010), Most AS events between human-mouse are species-specific (Irimia, 2009).
Functional consequences The transcriptome is dominated by “noisy” splicing, but tissue specific isoforms are highly expressed. (Prickrell et al, 2010)
AS functional importance (example) Sex determination in Drosophila melanogaster:
AS functional importance (example II) Splicing switch for differentiation:
Functional importance (example III) Its mis-regulation is associated to many disease phenotypes, including cancer.
Evolution of exon usage regulation Irimia et al, 2009
Evolution of expression “Rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures” 9 different organisms, 6 tissues, 2 individuals ~ 139 samples ~ 414145196 high quality reads General goal: explore the functional and evolutionary aspect of the regulation of exon usage
One to one orthologous exon graph 1-2-1 human exons - single hits - 100% coverage - 90% of identity - remove single exon genes 128040 conserved exons in 10673 genes
Exon usage and gene expression Tissue A Species 1 Species 2 Tissue B
Exon usage primate phylogeny Differences in exon usage are correlated with observed instances of speciation. Gorilla – Human: incomplete lineage sortage?
Conserved tissue specific DEU Known cases Brain: splicing plays important roles in differentiation and protein protein interaction in synapses. Testis: association with developmental processes
92% of the genes tested have evidence of conserved differential exon usage (10% FDR). Tissue specific splicing regulation is conserved, thus it is likely to be functional.
SFmap Sfmap (200 bp up and downstream the exons): Identifies binding motifs for 17 splicing factors Takes human – mouse conservation into account Takes binding motif cluster into account Background generation (Stuart, 2010) For each tissue, background with same length and expression Wilcoxon test on the number of motifs 30 January 2012 38
Explanations Combinatorial control of splicing decisions Chromatin state differences DEU between tissues is tightly regulated. New associations, different splicing factors doses = including / excluding exons
Evolution of exon usage regulation Irimia et al, 2009
Perspectives and future work (short term) Which protein sequence features are differentially used? (ELMs, phosphorylation signals, intrinsically disordered regions) TODAYS RESULT: Conserved DEU tend to avoid SMART domains? (protein structured regions? )
Exon usage within a single species How do the exons with conserved DEU vary between individuals of the same species? HapMap individuals RNA-seq samples
Ideas (long terms) Identification of important developmental splicing switches (Gabut et al, 2010), looking at conservation of events Identification of splicing aberrations in disease. “master equation” of splicing decisions: SF + DNA binding = exon inclusion/exclusion?
Acknowledgements Huber Group Simon Anders Wolfgang Huber