1 / 33

Exploring the package TopHat-CuffDiff

Exploring the package TopHat-CuffDiff. Jean-François Taly Bioinformatics Core Facilities Group meeting October 2 nd 2012. RNAseq expression data analysis. TopHat for mapping reads to the reference Reads directionality CuffDiff for the differential enrichment

creola
Download Presentation

Exploring the package TopHat-CuffDiff

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploring the package TopHat-CuffDiff Jean-François Taly Bioinformatics Core Facilities Group meeting October 2nd 2012

  2. RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?

  3. MitomiR_ EU0183 MitomiR project miRNP ? Regulation on mitochondrial translation miRNAs PNPASE miRNAs Mito proteins mRNAs Question 1 : Are Nuclear DNA-encoded miRNAs imported to mitochondria ? Slidefrom

  4. MitomiR_ EU0183 MitomiR project miRNP ? Regulation on mitochondrial translation miRNAs miRNAs proteins mRNAs Question 2 : Do miRNAs exist in the mitochondrial genome? Slidefrom

  5. One cell, twoDNAs Mitochondria Nucleus • -Circular DNA • -human (ADNmt) mitochondria genome = 16.6 kb • 13 for subunits of respiratory complexes I, III, IV and V • 22 for mitochondrial tRNA • 2 for rRNA • *One mitochondrion can contain two to ten copies of its DNA • * Exceptions to the universal genetic code (UGC) in mitochondria • -23 chromosome pairs • -human DNA : 2.9 billion DNA base pairs • -20,000 and 25,000 human protein-coding genes • -»Junk » DNA or non coding DNA • -Noncoding functional RNA (tRNA, rRNA,miRNA…) • The human genome may encode over 1000 miRNAs, which may target about 60% of mammalian gene From Lung et al. , 2006 MitomiR_ EU0183

  6. RNAseq libraries • Short insert size: searching for miRNAs • No poly-A selection • No fragmentation • Size selected: 18-36 nt • stranded • Long insert size: searching for lncRNAs • No poly-A selection • Fragmented • Size selected: 200 nt • stranded

  7. 2 Conditions • Total fraction (tot) • Full cell lysate • Mitochondrial fraction (mit) • RNA extracted from mitochondria

  8. RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?

  9. StrandedRNAseq: Vocabulary 5’ 3’ coding Forward coding Reverse 3’ 5’ Forward = 5’ endtheclosestfromcentromer in Human 50% of the genes are coding in the forward strand Forward / Reverse = Plus / Minus Coding / Template = Sense / Anti-sense http://www.biostars.org/post/show/3423/forward-and-reverse-strand-conventions/

  10. Orientation of reads? 5’ 3’ coding DNA 3’ 5’ template DNA Transcription 3’ 5’ RNA Reverse-transcription 3’ 5’ RNA Firststrandsequencing dUTP, NSR, NNSR 3’ 5’ cDNA Duplication Second strand sequencing Directional Illumina (Ligation) Standard SOLiD 3’ 5’ coding DNA 3’ 5’ cDNA

  11. Proper TopHat option? --library-type : • fr-unstranded: Default, Standard Illumina Reads • fr-firststrand: dUTP, NSR, NNSR • fr-secondstrand: Directional Illumina (Ligation), Standard SOLiD Wemappedthereadsusingtheunstranded and thesecondstrandforcomparisons

  12. How can we evaluate directionality? • Readsmapping in the F strandshouldbealignedwith genes coding in F as well. • Bitwise FLAG of the BAM file: • Howmanyreads in forward? samtools view -c -F 16 accepted_hits.bam • Howmanyreads in reverse? samtools view -c -f 16 accepted_hits.bam

  13. How can we evaluate directionality? (2) • Gene by gene • Bitwise FLAG + gene strandannotation A small number of genes received a huge amount of miss-mapped reads!

  14. Example of miss-aligned reads • AC097532.1: chr2:133038647-133038738 • miRNA automatically annotated in E67 but retired from E68; • CIGAR string of some reads is 26kb long; • 11,000,115 reads mapped (6% of total); • 8,205,667 mapped to the position 133,038,644; • NCBI blast of the major sequence: • hit on the opposite strand but with 100% coverage and 100% identity to the 28S ribosomal RNA.

  15. RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?

  16. CuffDiffneeds a special GTF • CuffDiffneeds a GTF withthe 2 followingtags: • tss_id: The ID of this transcript's inferred start site. • p_id: The ID of the coding sequence this transcript contains. • You can produce a compatible GTF withCuffCompare: cuffcompare -s /path/to/genome_seqs.fa -CG -r annotation.gtf

  17. Effect of CuffCompare CuffCompare + CuffDiff V2.0.2 CuffCompare + CuffDiff V2.0.2 CuffDiff V2.0.2 CuffDiff V2.0.2

  18. Effect of CuffDiffVersion CuffDiff V2.0.2 CuffDiff V2.0.2 CuffDiff V2.0.1 CuffDiff V2.0.1

  19. Highly sensible statistics Reproducibility? Version effect? CuffCompare effect? Genome annotation effect? From 902differentialy expressed genes with V2.0.1, we went to 15 with v2.0.2!!!

  20. RNAseq expression data analysis • TopHat for mapping reads to the reference • Reads directionality • CuffDiff for the differential enrichment • Statistics with version 2.0.0 or 2.0.1 • Enrichment threshold • Which transcripts are present in mitochondria?

  21. Expression data reflects expectations Statistics may not be trustable but the fold change is! • Define an enrichment threshold based on log2(FPKMtot/FPKMmit) • Cytosol • Vincinity of mitochodria • Mitochondrial genes

  22. Compartimented genes • Cytosolic genes: • UniProt: experimentaly observed in cytosol • Ensembl: no automatic annotations • Vincinity of mitochondria: • Paper from Kang et al. 2012 • Mitochondrial genes • The 37 genes in the chromosome

  23. Log2(FoldChange) distributionsforthelonginsertlibrary

  24. Summary

  25. Significantly enriched genes

  26. Back Up slides

  27. Mithochondrial genome

  28. Mithochondrial genome – first 3 genes

  29. Glucose Glycolysis 2 ATP Pyruvate O2 Glucose Lactate Glycolysis 2 ATP CO2 Aminoacids nucleotides Pyruvate OXPHOS 36 ATP Lactate Cellular metabolismregulation(E2C slide) Mitochondrial dysfunction Differentiation Warburg effect Proliferative cells Undifferentiated cells Biosynthesis efficiency Working cells Differentiated cells Energetic efficiency MCF7 is a breast cancercell line able to grow in OXPHOS conditions MCF7 Cells grown in different metabolic condition might represent a unique way to distinguish RNA subpopulation expressed in mitochondria (ncRNA and … miRNA?) Slidefrom

  30. MCF7 MCF7 oxphos MCF7 High Gluc shit to OXPHOS MCF7 High Gluc MCF7 High Gluc MCF7 High Gluc MCF7 High Gluc MCF7 oxphos MCF7 oxphos MCF7 Oxphos shift to High Gluc MCF7 oxphos Experimental design OXPHOS 0mM glucose Low Glucose High Glucose Stable MCF-7 cell lines SHIFTS!!! J1 J0 OXPHOS TLDA Total cells and mito extraction Milieu OXPHOS RNA-seq HIGH Glucose AGB:CH3854 ATCC:HTB-22 Milieu HIGH Glucose HIGH Glucose Min 3 weeks Total cells and mito extraction Stable cell lines OXPHOS N= 3 to 4 independent batches TLDA = MicrofluidicmiRNAqPCR

  31. Exon Exon 1 Exon2

More Related