1 / 32

Analysis of Next Generation Sequence Data

Analysis of Next Generation Sequence Data. Illumina HiSeq2000 600 Gbp (6 billion reads) in ~11 days. Typical Next Gen Experiments. Genome sequencing Novel genomes Resequencing Transcriptome sequencing (RNA-seq) Characterize transcripts with or without reference genome Typical length

asha
Download Presentation

Analysis of Next Generation Sequence Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Next Generation Sequence Data Illumina HiSeq2000 600 Gbp (6 billion reads) in ~11 days

  2. Typical Next Gen Experiments • Genome sequencing • Novel genomes • Resequencing • Transcriptome sequencing (RNA-seq) • Characterize transcripts with or without reference genome • Typical length • Short (microRNAs, …) • Find differentially expressed transcripts • Other • Methyl-seq • ChIP-seq • RIP-seq • …

  3. Types of Sequencing Libraries Single-End Reads - 5’ or 3’ (random) Paired-End Reads - 5’ and 3’ 200-500 bp Mate-Pair Reads - 5’ and 3’ 2-5 kbp

  4. What Does the Data Look Like?FASTQ File Format Sequence Quality (ASCII character for each base) > 80 million reads in one lane

  5. Quality Control Analysis of Reads

  6. Trim Sequences Prior To Analysis • Make sure sequencing adapters are removed • Trim ends of sequence based on quality scores

  7. Sequence Composition Diagnostics Unbiased Reads Biased Reads First Position Nearly Always “T”

  8. Genome Sequencing

  9. Workflows for Genome Sequencing Novel Genome Sequencing Resequencing Align reads from a sample to a reference genome assembly to examine variation BWA mapping software • de novo assembly • Generate contigs and scaffolds using overlapping reads • If applicable, align reads from a sample back to consensus to examine variation

  10. Sequence Alignment/Map (SAM) Format • Common file format to store reads and their alignment to a reference sequence • Generated by most next gen analysis software • samtools software package

  11. Binary Alignment/Map (BAM) Files • SAM (text file)  BAM (binary file) • Not human-readable • Smaller file sizes • BAM is widely used: • Often deposited to Gene Expression Omnibus (GEO) at NCBI • UCSC Genome Browser can display alignments as a track

  12. UCSC Genome Browser with 1,000 Genomes Project Data

  13. LookSeq at Sanger Mouse Genomes Project

  14. Glo1 CNV Present in Mouse Genomes Data for A/J Proximal Flank Chr17: 30.5Mb Max ~50x coverage Glo1 Locus Chr17: 30.7Mb Max >100x coverage Distal Flank Chr17: 31.2Mb Max ~50x coverage 50kb 50kb 50kb

  15. Glo1 CNV Not Present in Mouse Genomes Data for NZO Proximal Flank Chr17: 30.5Mb Max ~25x coverage Glo1 Locus Chr17: 30.7Mb Max ~25x coverage Distal Flank Chr17: 31.2Mb Max ~25x coverage 50kb 50kb 50kb

  16. RNA-seq Data Analysis

  17. RNA-Seq Reads are randomly sampled fragments from RNA sample a Proportion of reads for a transcript Expression level of transcript Garber et al, Nat Methods (2011) Lots of reads needed to construct models for every alternatively spliced transcript

  18. Experimental Design Auer & Doerge Genetics (2010) 185: 405-416

  19. Marioni et al, Genome Res (2008) 18(9):1509-17

  20. Comparison of Affy and RNA-seq Marioni et al, Genome Res (2008) 18(9):1509-17

  21. Comparison of Affy and RNA-seq Marioni et al, Genome Res (2008) 18(9):1509-17

  22. Marioni et al, Genome Res (2008) 18(9):1509-17

  23. Shendure Nat Methods (2008) 5(7): 585-7

  24. Workflows for RNA-seq QC Reads Novel Transcriptome Sequencing Transcriptome Sequencing with Reference Genome Align reads from each sample/group to genome Statistics for each transcript model Examine isoforms • de novo assembly • Align reads from each sample/group to assembly • Statistics for each transcript contig Analyze Counts

  25. de novo Transcriptome Assembly How much sequencing is enough? Rarefaction Plot

  26. Mapping Reads Align reads to a reference Genome assembly Transcriptome assembly Commonly used aligners: bwa bowtie

  27. RNAseq Workflow With Reference Genome Langmead et al. Genome Biology (2010), 11:R83

  28. Map Reads & ObtainCount Reads Per Gene Both utilize a reference genome

  29. Bowtie/TopHat Bowtie uses Burrows-Wheeler indexing for rapid mapping TopHat uses Initially Un-Mapped (IUM) reads to find novel splice sites Trapnell, Pachter, Salzberg. Bioinformatics (2009) 25(9):1105-1111

  30. Cufflinks FPKM = Fragments Per Kilobase of transcript per Million fragments mapped Trapnell et al. Nature Biotech (2010) 28(5):511-515

  31. Galaxy Can be used to upload FASTQ files and then run a number of QC tools and many other tools: bwa bowtie tophat cufflinks …

  32. Third Generation Sequencing

More Related