40 likes | 125 Views
1000G Phase 1 Release chr20 call sets. Ryan Poplin Genome Sequencing and Analysis Medical and Population Genetics January 25, 2011. Data and Definitions -- Pipeline. Full indel cleaning process including known indels BAQ calculation using GATK implementation of H. Li
E N D
1000G Phase 1 Release chr20 call sets Ryan Poplin Genome Sequencing and Analysis Medical and Population Genetics January 25, 2011
Data and Definitions -- Pipeline • Full indel cleaning process including known indels • BAQ calculation using GATK implementation of H. Li • Called by main continental AP and by admixed+ AP • Variant quality score recalibration • Quality cut chosen using HapMap3.3 + Omni 2.5M chip sensitivity • Cut at 99.2% of accessible sites • Not yet done genotype refinement
Data and Definitions – 1004 Samples • ASN = CHB + CHS + JPT • ASN+ = CHB + CHS + JPT + MXL + CLM + PUR • EUR = CEU + FIN + GBR + TSI + IBS • EUR+ = CEU + FIN + GBR + TSI + IBS + MXL + CLM + PUR + ASW • AFR = LWK + YRI + ASW • AFR+ = LWK + YRI + ASW + CLM + PUR • AMR = MXL + CLM + PUR • AMR+ = MXL + CLM + PUR + ASW • Note these definitions differ from the other groups
Final chr20 callsets including fragment-based calling and contrastive VQSR clustering