240 likes | 256 Views
Update on an investigation of somatic mutations in eMERGE datasets. Ken Kaufman CCHMC 6-20-19. Somatic Mutations. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues Keren Yizhak , François Aguet , Jaegil Kim, et al Science 07 Jun 2019:
E N D
Update on an investigation of somatic mutations in eMERGE datasets Ken Kaufman CCHMC 6-20-19
Somatic Mutations RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues Keren Yizhak, François Aguet, Jaegil Kim, et al Science 07 Jun 2019: Vol. 364, Issue 6444, eaaw0726
Mutations • Germline Somatic +/- +/+ +/+ +/- +/- +/- +/+ +/+ +/- +/- +/- +/- +/- +/- 50% 25%
Somatic Mutation Pipeline • Samtools pileup • Ref and alt alleles detected • Each base sequenced • Calculate ratio alt/depth allele • Filter • Ratio 1% to 30% • Amino-Acid altering • Depth of 10 • Not in a Duplicated regions
Samples processed • eMERGE 3 set A • 16,170 samples screened • 801 candidate somatic mutations in 773 samples • PGX ~10,000 samples processed • Initially 2798 samples 4403 variations • Filtered • 555 candidate somatic mutations in 541 samples • 66 samples with 2 or more candidates (5 highest) • 58 candidates found in 2 or more samples (4 highest). • 419 candidates found in 1 sample • 252 of the 555 candidate somatic mutations are found in EXAC database • MAF 0.009 to 8.2x10-6
Characteristics of Somatic Mutations • 178 genes • 488 type –A variants • 483 Non-syn • 2 Ins • 3 Del • 61 type –B (loss of function) • 24 Stopgain • 1 Stoploss • 21 Frameshift • 3 Init codon • 12 Splicing
GATK 150 555 75
Alt allele ratio Number of Variants Average Ratio
Validation • Obtained DNA for 11 samples from Vanderbilt and Northwestern (Thank you Very Much!) • Sanger Sequence PCR amplified product (with controls) • Real-time PCR • Digital droplet PCR
Validation • PGX • 9142 samples screened (6.1%) • 555 candidate somatic mutations in 541 samples • eMERGE 3 set A • 16,170 samples screened (5.0%) • 801 candidate somatic mutations in 773 samples • eMERGE 3 set B • In process downloading bam files from DNAnexus
Validation Strategy(iGENOMX Riptide) SNP Adapter Sequence biotinylated ddNTP
Strand displacing Extension SNP Adapter Barcode Random Sequence
Low cycle PCR SNP
Sequencing • Samples from 96 to 960 • Sequencers • MiSeq 25M reads • 96 samples • 96 targets • ~270 X coverage • HiSeq 900M reads (3 Lane) • 960 samples • 960 targets • ~100 X coverage
Current Status • Process remaining eMERGE 3 data set • Obtain samples for validation • Finalize validation strategy. • Contact Ken Kaufman (Kenneth.Kaufman@cchmc.org) or Paul Gecaine (Paul.Gecaine@cchmc.org) to participate.
Acknowledgements DNAnexus • Andrew Carroll • John Didion • eMERGE consortium • Vanderbilt • Northwestern • University of Washington • Baylor (Richard Gibbs group) • CCHMC • John Harley • Scott Richards • Paul Gecaine • Beth Cobb • Cindy Prows • Bahram Namjou-Khales Contact Ken Kaufman (Kenneth.Kaufman@cchmc.org) or Paul Gecaine (Paul.Gecaine@cchmc.org) to participate
Strategy • VCF files only have data where a variant was called. • Bam files have data at every position sequenced. • Ratio of Ref to Alternate allele skewed
Comparison other Programs • Mosaic Hunter • Tested against candidate samples • 56 somatic candidates • 3 overlapped • Somaticsniper • Normal vs tumor • Failed in our application • Most somatic mutation detection approaches require optimization for each data set.