330 likes | 424 Views
Exploring the package TopHat-CuffDiff. Jean-François Taly Bioinformatics Core Facilities Group meeting October 2 nd 2012. RNAseq expression data analysis. TopHat for mapping reads to the reference Reads directionality CuffDiff for the differential enrichment
E N D
Exploring the package TopHat-CuffDiff Jean-François Taly Bioinformatics Core Facilities Group meeting October 2nd 2012
RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?
MitomiR_ EU0183 MitomiR project miRNP ? Regulation on mitochondrial translation miRNAs PNPASE miRNAs Mito proteins mRNAs Question 1 : Are Nuclear DNA-encoded miRNAs imported to mitochondria ? Slidefrom
MitomiR_ EU0183 MitomiR project miRNP ? Regulation on mitochondrial translation miRNAs miRNAs proteins mRNAs Question 2 : Do miRNAs exist in the mitochondrial genome? Slidefrom
One cell, twoDNAs Mitochondria Nucleus • -Circular DNA • -human (ADNmt) mitochondria genome = 16.6 kb • 13 for subunits of respiratory complexes I, III, IV and V • 22 for mitochondrial tRNA • 2 for rRNA • *One mitochondrion can contain two to ten copies of its DNA • * Exceptions to the universal genetic code (UGC) in mitochondria • -23 chromosome pairs • -human DNA : 2.9 billion DNA base pairs • -20,000 and 25,000 human protein-coding genes • -»Junk » DNA or non coding DNA • -Noncoding functional RNA (tRNA, rRNA,miRNA…) • The human genome may encode over 1000 miRNAs, which may target about 60% of mammalian gene From Lung et al. , 2006 MitomiR_ EU0183
RNAseq libraries • Short insert size: searching for miRNAs • No poly-A selection • No fragmentation • Size selected: 18-36 nt • stranded • Long insert size: searching for lncRNAs • No poly-A selection • Fragmented • Size selected: 200 nt • stranded
2 Conditions • Total fraction (tot) • Full cell lysate • Mitochondrial fraction (mit) • RNA extracted from mitochondria
RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?
StrandedRNAseq: Vocabulary 5’ 3’ coding Forward coding Reverse 3’ 5’ Forward = 5’ endtheclosestfromcentromer in Human 50% of the genes are coding in the forward strand Forward / Reverse = Plus / Minus Coding / Template = Sense / Anti-sense http://www.biostars.org/post/show/3423/forward-and-reverse-strand-conventions/
Orientation of reads? 5’ 3’ coding DNA 3’ 5’ template DNA Transcription 3’ 5’ RNA Reverse-transcription 3’ 5’ RNA Firststrandsequencing dUTP, NSR, NNSR 3’ 5’ cDNA Duplication Second strand sequencing Directional Illumina (Ligation) Standard SOLiD 3’ 5’ coding DNA 3’ 5’ cDNA
Proper TopHat option? --library-type : • fr-unstranded: Default, Standard Illumina Reads • fr-firststrand: dUTP, NSR, NNSR • fr-secondstrand: Directional Illumina (Ligation), Standard SOLiD Wemappedthereadsusingtheunstranded and thesecondstrandforcomparisons
How can we evaluate directionality? • Readsmapping in the F strandshouldbealignedwith genes coding in F as well. • Bitwise FLAG of the BAM file: • Howmanyreads in forward? samtools view -c -F 16 accepted_hits.bam • Howmanyreads in reverse? samtools view -c -f 16 accepted_hits.bam
How can we evaluate directionality? (2) • Gene by gene • Bitwise FLAG + gene strandannotation A small number of genes received a huge amount of miss-mapped reads!
Example of miss-aligned reads • AC097532.1: chr2:133038647-133038738 • miRNA automatically annotated in E67 but retired from E68; • CIGAR string of some reads is 26kb long; • 11,000,115 reads mapped (6% of total); • 8,205,667 mapped to the position 133,038,644; • NCBI blast of the major sequence: • hit on the opposite strand but with 100% coverage and 100% identity to the 28S ribosomal RNA.
RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?
CuffDiffneeds a special GTF • CuffDiffneeds a GTF withthe 2 followingtags: • tss_id: The ID of this transcript's inferred start site. • p_id: The ID of the coding sequence this transcript contains. • You can produce a compatible GTF withCuffCompare: cuffcompare -s /path/to/genome_seqs.fa -CG -r annotation.gtf
Effect of CuffCompare CuffCompare + CuffDiff V2.0.2 CuffCompare + CuffDiff V2.0.2 CuffDiff V2.0.2 CuffDiff V2.0.2
Effect of CuffDiffVersion CuffDiff V2.0.2 CuffDiff V2.0.2 CuffDiff V2.0.1 CuffDiff V2.0.1
Highly sensible statistics Reproducibility? Version effect? CuffCompare effect? Genome annotation effect? From 902differentialy expressed genes with V2.0.1, we went to 15 with v2.0.2!!!
RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?
Expression data reflects expectations Statistics may not be trustable but the fold change is! • Define an enrichment threshold based on log2(FPKMtot/FPKMmit) • Cytosol • Vincinity of mitochodria • Mitochondrial genes
Compartimented genes • Cytosolic genes: • UniProt: experimentaly observed in cytosol • Ensembl: no automatic annotations • Vincinity of mitochondria: • Paper from Kang et al. 2012 • Mitochondrial genes • The 37 genes in the chromosome
Glucose Glycolysis 2 ATP Pyruvate O2 Glucose Lactate Glycolysis 2 ATP CO2 Aminoacids nucleotides Pyruvate OXPHOS 36 ATP Lactate Cellular metabolismregulation(E2C slide) Mitochondrial dysfunction Differentiation Warburg effect Proliferative cells Undifferentiated cells Biosynthesis efficiency Working cells Differentiated cells Energetic efficiency MCF7 is a breast cancercell line able to grow in OXPHOS conditions MCF7 Cells grown in different metabolic condition might represent a unique way to distinguish RNA subpopulation expressed in mitochondria (ncRNA and … miRNA?) Slidefrom
MCF7 MCF7 oxphos MCF7 High Gluc shit to OXPHOS MCF7 High Gluc MCF7 High Gluc MCF7 High Gluc MCF7 High Gluc MCF7 oxphos MCF7 oxphos MCF7 Oxphos shift to High Gluc MCF7 oxphos Experimental design OXPHOS 0mM glucose Low Glucose High Glucose Stable MCF-7 cell lines SHIFTS!!! J1 J0 OXPHOS TLDA Total cells and mito extraction Milieu OXPHOS RNA-seq HIGH Glucose AGB:CH3854 ATCC:HTB-22 Milieu HIGH Glucose HIGH Glucose Min 3 weeks Total cells and mito extraction Stable cell lines OXPHOS N= 3 to 4 independent batches TLDA = MicrofluidicmiRNAqPCR
Exon Exon 1 Exon2