50 likes | 193 Views
NGS Assignment. part 2. Background. Approximately a million Illumina reads from a single individual All are from Chromosome 22 Data was mapped to reference using Novoalign Duplicate reads were removed using Picard Variants were called by GATK
E N D
NGS Assignment part 2
Background • Approximately a million Illumina reads from a single individual • All are from Chromosome 22 • Data was mapped to reference using Novoalign • Duplicate reads were removed using Picard • Variants were called by GATK This is exactly what the shell script intended to do… NOTE: Write up due
The task • Download the variants (VCF file) from the course page • Submit to WEB-ANNOVAR using the standard settings: http://wannovar.usc.edu/ • Download as CSV or separated file and answer the following questions based on filtering with EXCEL, a script, or UNIX command flows…
Answer the following: • Are there any strange ‘variants’. HINT: (remember this is for chromosome 22 only) • How many rare nonsynonymous variants (frequency < 0.05) can you find, based on: • dbSNP • ESP6500 (BONUS: write a bit about WHAT this is) • both • Find a novel variant in the list (c) above • (HINT: if it’s not in dbSNP, it is probably novel) • Do any of the functional prediction tools (LJB_***_Pred) call it ‘deleterious’ (D) or as being at a conserved site (C). • What is the gene? Find as much evidence as you can about possible roles in disease.
How many frameshift insertions/deletions can you find? How many would you trust (HINT: find mentions of quality issues) • How many splicing variants? How many are rare? • BONUS: How many novel, nonsynomous, homozygous variants are there that are predicted as being functional by at least one of the tools (same as in 3b)?