1 / 76

High-Throughput Sequencing

High-Throughput Sequencing. Advanced Microarray Analysis BIOS 691-803, 2008 Dr. Mark Reimers, VCU. Quantitative HTS - Outline. Technology Preprocessing Quantitative analysis Applications ChIP-Seq RNA-Seq Methyl-Seq. The Technology.

dena
Download Presentation

High-Throughput Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High-Throughput Sequencing Advanced Microarray Analysis BIOS 691-803, 2008 Dr. Mark Reimers, VCU

  2. Quantitative HTS - Outline • Technology • Preprocessing • Quantitative analysis • Applications • ChIP-Seq • RNA-Seq • Methyl-Seq

  3. The Technology • Most sequencing proceeds by addition of fluor-labeled bases • Do this in parallel on a flat surface • Capture each stage with good camera • Align images

  4. Roche - 454 • Parallel Pyrosequencing on beads

  5. Mardis, Trends in Genetics

  6. 454 Sequencing Operation

  7. Illumina - Solexa

  8. Resquencing each fragment with different primers Reconstruct each fragment separately ABI SOLiD

  9. Paired-End Reads

  10. Issues • Pre-processing • Base calling • Mapping reads • QA • Quantitative analysis • Variation and noise • Biases • Models • Accuracy and validation

  11. Pre-processing – Base Calling • Not all steps completed properly • Sequence can lag behind or skip ahead • Hence most light spots a mixture of different colors • Simple rule: use brightest signal

  12. Types of mismatches in uniquely mapped tags with a single mismatch are profoundly asymmetric and biased Courtesy Thierry-Mieg

  13. Typical Errors in Base-Calling

  14. Position of single mismatch in uniquely mapped tags Courtesy Thierry-Mieg

  15. Improving Base-Calling with SVM

  16. Pre-processing – Mapping Reads • Huge numbers (10M – 70M) • BLAT (2002 high-speed method) • Eland (proprietary Illumina) • Other new methods: MAQ, SOAP

  17. Fraction of reads mapping to targets Typically 5-10M reads per lane and 60-80% map to targets Some repetitive sequence Quality Assessment

  18. Comparing Samples - A Simple Normalization • Different numbers of counts per lane • Divide counts in a region of interest (a genomic region or a gene or an exon) by all counts (total per million reads -TPM) • For comparing genomic regions of different lengths divide also by length of region TPKM (total per kilobase per million)

  19. Quant. Analysis - Variation • Poisson model often used for random variation • Most HTS data ‘over-dispersed’ relative to Poisson • Negative Binomial often used • Parameter fitted

  20. Quantitative Analysis - Biases • Not all regions represented equally • GC rich regions represented more • Independent of GC some chromosome regions represented more • Euchromatin bias • Sequence initiation site biases • ‘Mapability’ biases – some regions won’t have any uniquely mapped tags

  21. Density of reads depends strongly on GC content of regions GC Bias GC content (%)

  22. Genomic Position Biases • Count tags from randomly sheared DNA in red with GC content in blue

  23. Start Position Bias

  24. Consistent Start Position Bias Counts per start site in lane 1 vs lane 2

  25. RNA-Seq

  26. RNA-Seq Data Gene Model Kidney Reads Liver Reads From Marioni et al 2008

  27. Accuracy of Illumina RNA-Seq

  28. Issues How replicable is RNA-Seq? How consistent are the two technologies? Which is better? Marioni et al, Genome Research, 2008 Comparing RNA-Seq & Affy

  29. Comparing Fold-Changes • D.E. by ILM • Red >250 • Green <250 • Black Not DE by ILM

  30. Model for Variation • Poisson counts hypergeometric comparison • Make uniform p-values by adding random term • Use lower tails only

  31. False Positive Rates • QQ-plots of p-values between tech. reps

  32. Different Concentrations are NOT Comparable! • QQ-plots of p-values between 3pM and 1.5 pM

  33. Normalization of RNA-Seq • Robinson et al noticed that most genes appeared less expressed in liver Fig 1 from Robinson & Oshlak, Genome Biology 2010

  34. A Better Normalization for RNA-Seq - TMM • Drop extremes of ratios • Drop very high count genes • Compute trimmed means of samples • Center log-ratios between samples

  35. New Things to do with RNA-Seq • Allele-specific expression • Splice variation • Between tissues • In disease • Alternate initiation sites • Select 5’ capped RNA fragments • Alternate termination

  36. It is possible to compare allele-specific expression counts Sample from VCU Replicate samples P-values for binomial tests of equality About half show differential expression! Allelic Comparison

  37. Detecting Splice Variation • Deep sequencing shows up clear variation in exon usage • Wang et al Nature 2008

  38. Tissue Map of Splice Variation From Wang et al • Brain is most distinctive • Individuals seem to differ • Cell lines seem to have distinct splice patterns

  39. Splicing is Complex • Many different splice operations exist • Only some of these characterized by counting exon reads

  40. Issues in Detecting Splice Variants • Counts in exons reflect biases (as yet uncharacterized) as well as actual abundance • Reads that bridge splice junctions would be definitive but mapping is very dubious with short (<40 base) reads • All possible splice junctions are not known • Hard to even search through the known ones

  41. Methodology for Splice Variants • Count reads mapped to exons and and compare ratios across samples • Wang et al, and most others • Count reads that cross splice junctions

  42. Methodology for Finding Junctions

  43. ChIP-Seq

  44. Chromatin Immuno-precipitation

  45. ChIP-Seq Workflow • Cross-link proteins to DNA • Fragment DNA • Extract with antibody • Reverse cross links • Sequence fragments • DO CONTROLS!

  46. ChIP-Seq Data • From Rozowsky et al, Nature Biotech 2009

  47. ChIP-Seq vs ChIP-chip

  48. Peak-Finding - Simple • Extend tags and count overlap • How much to extend?

  49. Peak Finding – Better • Tags starting on opposite strands are likely to start at opposite ends • Identifying the cross-over point leads to improved accuracy

More Related