1 / 10

BEADS: Bias Elimination Algorithm for Deep Sequencing

BEADS is a method developed at the University of Cambridge to correct biases in deep sequencing data, particularly in ChIP-seq analysis for genomic DNA. The algorithm addresses tag count variations, GC content, mappability differences, and local effects to ensure accurate peak calling and signal plotting across features.

spadaro
Download Presentation

BEADS: Bias Elimination Algorithm for Deep Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BEADS: Bias Elimination Algorithm for Deep Sequencing NICOLE CHEUNG, THOMAS DOWN, JULIE AHRINGER The Gurdon Institute University of Cambridge, Cambridge UK

  2. Bias in deep sequencing data: Tag counts for C. elegans genomic DNA and ChIP input sequence are not uniform (tag count biases also observed in human ChIP input sequence (Rozowsky et al 2009)) • Analysis of ChIP seq data needs to take bias into account: • Peak calling • Input sequence is used for significance scoring of peaks • But only regions enriched in ChIP are processed. • (e.g, MACS, PeakSeq, SPP, …) • Plotting signal across features (e.g, TSS, exons, …) • Need a method to correct signal at all positions

  3. Patterns correlate with GC content and mappability differences Low read counts in regions of low mappability Genomic and input DNA patterns are similar to GC content track

  4. GC rich sequences are over-represented %GC input sequence % GC genome Low % GC is under-represented High % GC is over-represented 0 10 20 30 40 50 60 70 80 % GC

  5. Patterns of %GC, mappability, and ChIP input sequence across C. elegans exons and promoters EXONS TSSs 500 bp intron >100 bp 500 bp intron % GC Mappability Raw input sequence Peaks in TSSs of human genes observed by Rozowsky et al 2009 PeakSeq paper)

  6. Tag counts of raw human genomic sequence across exons

  7. Normalization strategy • 1) G+C count normalization • Apply on every read • 2) Mappability correction • Apply on each genomic location • 3) Residual local effect, e.g. DNA accessibility - correct using information from input

  8. Three step normalization removes bias in ChIP input sequence EXONS TSSs 500 bp intron 500 bp intron >100 bp Correction applied: raw GC GC + map Signal across exons and TSSs is flat in ChIP Input sequence after correction GC + map + local

  9. H3K4me3 previous knowledge: peaks in promoter regions TSSs EXONS 500 bp intron 500 bp intron >100 bp Correction applied: raw GC GC + map Expected promoter peaks remain after normalization Exon peaks removed after normalization GC + map + local

  10. H3K36me3 previous knowledge: on transcribed regions, enriched in exons TSSs EXONS 500 bp intron 500 bp intron >100 bp Correction applied: raw GC GC + map H3K36me3 signal on gene body H3K36me3 enrichment on exons GC + map + local

More Related