320 likes | 491 Views
Variant Analysis Introduction. Deanna M. Church Staff Scientist, NCBI. Short Course in Medical Genetics 2013. @ deannachurch. Steve Sherry, NCBI. BAM. FASTQ. BAM. FASTQ. VCF. VCF. VCF. VCF. http:// www.bioplanet.com / gcat. http:// www.ncbi.nlm.nih.gov /variation/tools/1000genomes.
E N D
Variant Analysis Introduction Deanna M. Church Staff Scientist, NCBI Short Course in Medical Genetics 2013 @deannachurch
Steve Sherry, NCBI BAM FASTQ BAM FASTQ VCF VCF VCF VCF
Variation Databases • Collection of small nucleotide variation (SNVs) • Typically <50 bp • Some are polymorphic • Some are rare • Some are errors • Submissions clustered to make reference variants (rsIDs) http://www.ncbi.nlm.nih.gov/snp
Variation Databases Blue variants are all T insertions Submitters submit in different part of the polyT tract Need additional analysis to cluster these
Variation Databases • Collection of large-scale variation • Breakpoint ambiguity • Complex variants (chromothripsis) • Challenging to compare variants from different methods • No reference variants (yet) http://www.ncbi.nlm.nih.gov/dbvar
Variant Call Ambiguity start stop Probes with decreased signal intensity Probes with expected signal intensity breakpoint breakpoint Inner start Inner stop Outer start Outer stop Inner start Inner stop
Variant Call Ambiguity Fosmid clone (40 Kb +/- 1 Kb) Clone has an insertionrelative to the genome 20Kb Clone has a deletionrelative to the genome 60 Kb Outer start Outer stop
Variation Databases http://www.ncbi.nlm.nih.gov/clinvar
How confident am I that my variant call is correct?
Available NGS Aligners already out of data Fonseca et al., 2012
Alignment Test Align back to the source Simulated Reads Good: know where the reads go Not so good: hard to simulate real data http://www.bioplanet.com/gcat
Variant Calling Test Transition /Transversion ratio (Ti/Tv) A C Random: 0.5 Whole Genome: 2.0 – 2.1 Exome: 3-3.5 T G Transversions Transitions
Variant Calling Test Note: Difficult to test variant calling independentlyfrom the aligner as they are often coupled.
Variant Calling Test Benchmarking on known samples NA12878 NA19240
Target audience: Clinical testing labs Submissions from: Clinical and Research labs Concordant NA Discordant Calls Tests cSRA http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
Variant Analysis Pipelines: Galaxy https://main.g2.bx.psu.edu/
Variant Analysis Pipelines: Galaxy • Workflows • Save them • Share them • Can run on Amazon Cloud • Large community Reproducibility
Annotating Variants NC_000001.10:g.170508561T>A NC_000001.10:g.170508573T>C NC_000001.10:g.170508656G>T NC_000001.10:g.170508724T>C
Annotating Variants Molecular Consequences (often predicted) Damaging amino acid change Affect a splice site Change a regulatory feature Functional Consequences (typically asserted) Experiments show the change affects expression Allele associated with a disorder Allele shown to affect some function
Annotating Variants MAPKAPK2 DYRK3
Annotating Variants Upload your list of variants, get back Is the variant known? Is the variant predicted to be deleterious to a protein (SIFT, PolyPhen) Overlap with predicted regulatory region HGVS expressions http://www.ensembl.org/info/docs/variation/vep/index.html
Annotating Variants Upload your list of variants, get back Is the variant known? Does the allele have a molecular consequence (change AA, nonsynonymous) HGVS expressions ClinVar information Available Genetic Tests Publications http://www.ncbi.nlm.nih.gov/variation/tools/reporter
Take home messages • Lots of methods for sequence alignment • Lots of methods for variant calling • Typically developed to use a particular aligner • Different data sources can affect your annotation