280 likes | 296 Views
Learn about detecting structural variations via NGS technology, software methods, and exercises for practical implementation.
E N D
Structural Variation Detection Using NGS technology Ke Lin 23rd Feb, 2012
Content • Introduction • Methods and software used for SV detection • Exercises
What is Structural Variation? • variation in structure of chromosomes in one species • using FISH to detect and localize the presence or absence of specific DNA sequences Introduction
What is Structural Variation? • a region of DNA include inversions, balanced translocation and genomic imbalances (CNV) • approximately 1kb or greater in size • many of SVs are associated with genetic diseases Introduction
What can NGS do to detect SV? • hypothesis: the reference genome of the species is available • re-sequencing of other individuals of the species with shallow genome coverage (< 30X) • paired-end sequencing Introduction
What can NGS do to detect SV? Introduction
What can NGS do to detect SV? Introduction
local (de novo) assembly and then align assembled sequences to reference genomes Methods used for SV detections
local assembly and then align assembled sequences to reference genomes Methods used for SV detections
local (de novo) assembly and then align assembled sequences to reference genomes • accurate but costly • the genomes of individuals within one species should be quite similar on sequence level Methods used for SV detections
2. map reads to reference genomes and deduce the SV according to expected insert size of the pairs • not accurate enough but much less cost • lots of methods were developed • downstream analysis can help to increase the accuracy Methods used for SV detections
Signatures used for SV discovery • PEM (Paired End Mapping) Methods used for SV detections
Signatures used for SV discovery • PEM (Paired End Mapping) • paired end reads have to both mapped to references • reads need to align without gaps Methods used for SV detections
Signatures used for SV discovery • DOC (Depth Of Coverage) Methods used for SV detections
Signatures used for SV discovery • DOC (Depth Of Coverage) • don't know where the copies occur • not able to detect insertions of novel sequence Methods used for SV detections
Signatures used for SV discovery • Split reads Methods used for SV detections
Signatures used for SV discovery • Split reads • gaps introduced is size limited (allow a few base pairs) • novel sequence insertions will not be complete if the local assembly of hanging reads are substantially larger than the insert size Methods used for SV detections
PEM • BreakDancer Input: BWA mapping output, bam format Command: bam2cfg.pl -g -h bamfile1 bamfile2 .. > configure_file Output: Configuration file for next process Software of each Methods used for SV detections
PEM • BreakDancer Software of each Methods used for SV detections
PEM • BreakDancer Software of each Methods used for SV detections
PEM • BreakDancer Input: configuration file Command: breakdancer_max -h -g int.bed -o chromosome cfg_file > output Output: tab delimited file Software of each Methods used for SV detections
1. Chromosome 12. Position 13. Orientation 14. Chromosome 25. Position 26. Orientation 27. Type of a SV8. Size of a SV9. Confidence Score10. Total number of supporting read pairs11. Total number of supporting read pairs from each bam/library12. Estimated allele frequency (if -h)13 - end. copy number for each bam/library Software of each Methods used for SV detections
DOC • cnD Input: BWA mapping output, bam format Command: samtools pileup -c bamfile | pileup2win.pl > output_file Output: windows file for next process Software of each Methods used for SV detections
DOC • cnD Input: windows file Command: cnD.x86-64 --prefix=lib_name --nohet windows_file1 cat lib*_viterbi.txt > viterbi.txt metaCaller.pl --threshold=value viterbi.txt > metacalls.txt extractCNChanges.pl metacalls.txt > output Output: tab delimited file chr start pos end pos Gain/Loss Software of each Methods used for SV detections
Split reads • Pindel Input: configuration file Command: pindel_x86_64 -f ref.fasta -i cfg_file -c ALL -o name Output: files with indicative names D = deletion, SI = short insertion, INV = inversionTD = tandem duplication, LI = large insertion, BP = unassigned Software of each Methods used for SV detections
Local assembly of SV regions • Annotation of novel insertion • Fine tune potential changed gene model Downstream Analysis after SV detections
Local assembly of SV regions • Annotation of novel insertion • Fine tune potential changed gene model Downstream Analysis after SV detections
Find all deletions in chromosome1 using BreakDancer. Try to do it using cnD (gene loss) and Pindel respectively. The input file can be found: /mnt/geninf15/work/bif_course_2012/SV/exercises/ The documentation of each program can be found: /mnt/geninf15/work/bif_course_2012/SV/DOC/ Exercises: