1 / 14

Samples tested

Samples tested. Experimental set up. Aligner: BWA 0.5.9 Samtools : 0.1.12a GATK: 1.0.5083 SVA version: 1.1 1000 genome variant release: CEU.trio.2010_03.genotypes.vcf ( build 36 ) Human genome reference (build 36 & 37) Dbsnp databases (build 36 & 37) for overlap check.

ayita
Download Presentation

Samples tested

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Samples tested

  2. Experimental set up • Aligner: BWA 0.5.9 • Samtools: 0.1.12a • GATK: 1.0.5083 • SVA version: 1.1 • 1000 genome variant release: CEU.trio.2010_03.genotypes.vcf (build 36) • Human genome reference (build 36 & 37) • Dbsnp databases (build 36 & 37) for overlap check

  3. Ratio (%) of variant calls (SNVs and indels)--GATK against raw

  4. Ti/Tv ratio closer to 2.0

  5. Het SNVs on chrX compared to total number of SNVs in the X

  6. Increased concordance (SVA) Dukscz0106, build 36

  7. Increased concordance (SVA) als9c2, build 37

  8. Increased concordance (SVA) dodrp751952, build 37

  9. Increased sensitivity (na12878)--Overlap with 1KG release

  10. But, decreased concordance (SVA) mchd002A2, build 36

  11. Computational complexity recalibration(extra cost ~31 hours if incorporated in pipeline) Original BAM file 8.8 hours ~seconds* Pre-recalibration analysis plots Covariates table (.csv) 22.5 hours Recalibrated BAM file 7.7 hours* ~seconds* Post-recalibration analysis plots Recalibrated covariates table (.csv) Note *: ONLY needed for periodical evaluation on recalibration efficiency

  12. Computational complexity variant calling(already in pipeline) Processed BAM files ~ 4 days SNVs & Indel files var_flt_vcf.snp var_flt_vcf.indel ~ 6 hours SNV count Hom/het ratio Indel count Ti/Tv ratio Overlap with dbSNP concordance Het/tot ratio on X

  13. Conclusion • For four samples (als9c2, dukscz0106, na12878 and dodrp751952) • Increased number of variant calls • Increased concordance to genotype array calls • Ti/Tv ratio closer to 2.0 • Increased sensitivity in 1KG sample • For sample mchd002A2 • Decreased number of variant calls • Decreased concordance to genotype calls • Ti/Tv ratio farer from 2.0 • For two male samples (dukscz0106 and dodrp751952) • No apparent difference on Het SNV ratio on chromosome X • Overall conclusion • GATK recalibration helps with the sensitivity and concordance to chip data • GATK recalibration might NOT help with specificity • Additional attention is need for proper FDR control • Worse results with mchd002A2 could be sample specific

  14. Going forward • Decision making • Computational cost (~ 30 hours) VS • Increased sensitivity • Up to 0.51% ( on average ~10k SNP gain) • Recommend incorporating this step in new pipeline • Extra/future steps for individual project • To reduce “potential” false positive rate by excluding “highly variations region” (Zhu, Q et al 2011) • Implement more stringent variant call parameters • More precise variant call strategy (long term goal)

More Related