140 likes | 303 Views
Samples tested. Experimental set up. Aligner: BWA 0.5.9 Samtools : 0.1.12a GATK: 1.0.5083 SVA version: 1.1 1000 genome variant release: CEU.trio.2010_03.genotypes.vcf ( build 36 ) Human genome reference (build 36 & 37) Dbsnp databases (build 36 & 37) for overlap check.
E N D
Experimental set up • Aligner: BWA 0.5.9 • Samtools: 0.1.12a • GATK: 1.0.5083 • SVA version: 1.1 • 1000 genome variant release: CEU.trio.2010_03.genotypes.vcf (build 36) • Human genome reference (build 36 & 37) • Dbsnp databases (build 36 & 37) for overlap check
Ratio (%) of variant calls (SNVs and indels)--GATK against raw
Increased concordance (SVA) Dukscz0106, build 36
Increased concordance (SVA) als9c2, build 37
Increased concordance (SVA) dodrp751952, build 37
But, decreased concordance (SVA) mchd002A2, build 36
Computational complexity recalibration(extra cost ~31 hours if incorporated in pipeline) Original BAM file 8.8 hours ~seconds* Pre-recalibration analysis plots Covariates table (.csv) 22.5 hours Recalibrated BAM file 7.7 hours* ~seconds* Post-recalibration analysis plots Recalibrated covariates table (.csv) Note *: ONLY needed for periodical evaluation on recalibration efficiency
Computational complexity variant calling(already in pipeline) Processed BAM files ~ 4 days SNVs & Indel files var_flt_vcf.snp var_flt_vcf.indel ~ 6 hours SNV count Hom/het ratio Indel count Ti/Tv ratio Overlap with dbSNP concordance Het/tot ratio on X
Conclusion • For four samples (als9c2, dukscz0106, na12878 and dodrp751952) • Increased number of variant calls • Increased concordance to genotype array calls • Ti/Tv ratio closer to 2.0 • Increased sensitivity in 1KG sample • For sample mchd002A2 • Decreased number of variant calls • Decreased concordance to genotype calls • Ti/Tv ratio farer from 2.0 • For two male samples (dukscz0106 and dodrp751952) • No apparent difference on Het SNV ratio on chromosome X • Overall conclusion • GATK recalibration helps with the sensitivity and concordance to chip data • GATK recalibration might NOT help with specificity • Additional attention is need for proper FDR control • Worse results with mchd002A2 could be sample specific
Going forward • Decision making • Computational cost (~ 30 hours) VS • Increased sensitivity • Up to 0.51% ( on average ~10k SNP gain) • Recommend incorporating this step in new pipeline • Extra/future steps for individual project • To reduce “potential” false positive rate by excluding “highly variations region” (Zhu, Q et al 2011) • Implement more stringent variant call parameters • More precise variant call strategy (long term goal)