360 likes | 373 Views
Canadian Bioinformatics Workshops. www.bioinformatics.ca. Module #: Title of Module. 2. Module 5 Visual Analysis of HT-seq data. Marc Fiume Informatics for High Throughput Sequencing Data June 9-10, 2014. Part II (continued): Visualizing Structural Variants.
E N D
Canadian Bioinformatics Workshops www.bioinformatics.ca
Module 5Visual Analysis of HT-seq data Marc Fiume Informatics for High Throughput Sequencing Data June 9-10, 2014
Part II (continued): Visualizing Structural Variants
Review: structural variation detection • covered in Module 4 • two complementary approaches: • depth of coverage (DOC) • paired end mapping (PEM)
PEM: small insertions donor reference
PEM: large insertions donor reference
PEM: deletions donor reference
PEM: inversions donor reference one read inverted when mapped
PEM: tandem duplications donor reference order of read mappings reversed
Structural Variants in Savant • Savant has a visualization mode for BAM files called “Matepair (Arc)” that is specialized for identifying structural variants using the PEM methodology • it connects the locations of paired mappings by an arc • arc height represents the mapped distance • arc color represents the relative orientation of the reads (for complex rearrangements, like inverstions)
Question 1 which visualization mode in Savant is best for finding SNPs? why?
Question 2 which visualization mode in Savant is best for finding structural variations? why?
Question 3 what kind of event does this image depict? e.g. chr1: 5,195,017 - 5,199,144
A: INSERTION donor reference
Question 4 what kind of event does this image depict? chr1: 26,489,321 - 26,490,661
A: DELETION donor reference
Question 5 what would a heterozygous deletion look like? chr1: 31,574,172 - 31,578,242
Question 6 what kind of event does this image depict? chr1: 81,659,802 - 81,661,916
A: Inversion donor reference one read inverted when mapped
Question 7 what kind of event does this image depict? chr1: 11,050,416 - 11,055,457
A: Tandem Duplication donor reference order of read mappings reversed
Part III : Interactive Variant Analysis this is bonus material, covered if time permits contact mfiume@cs.toronto.edu for questions
Genetic Variant Analysis • finding disease-causing genetic mutation is “like trying to find a needle in a haystack needlestack” • lots of variants • many distractors • many false positives • errors in sequencing • errors in variant prediction • most true positives are not causal • not related to phenotype of interest, not damaging
Genetic Variant Analysis • filter variants based on quality, effect, and relevance to disease variant calling annotation filtration visualization Modules 1-3 Module 4.1
Existing Tools • command-line is powerful but not interactive • Excel / Genome Browsers are interactive but not powerful
MedSavant • visual analytics from variant calling to disease mutation discovery variant calling annotation filtration visualization MedSavant
You might also want to try • VarSifter works in memory, good for small projects • this space is evolving; difficult to do a comprehensive comparison • much more commercial activity compared to genome browsers VarSifter Golden Helix SVS (commercial)