160 likes | 175 Views
MESA FS SC Meeting. Candidate Wide Association Study (CWAS) aka ‘Data Mining’ Joe Mychaleckyj. CG Panel 1+2: Summary. SNPs CG1 CG2 TOTAL Picked 1536 1535* 3071 Typed 1440 1467 2907 Typed, Not Dropped 1440 1442 2882 AVAILABLE FOR CWAS * One duplicate SNP AIMs CG1 CG2 TOTAL
E N D
MESA FS SC Meeting Candidate Wide Association Study (CWAS) aka ‘Data Mining’ Joe Mychaleckyj
CG Panel 1+2: Summary SNPs CG1 CG2 TOTAL Picked 1536 1535* 3071 Typed 1440 1467 2907 Typed, Not Dropped 1440 1442 2882 AVAILABLE FOR CWAS * One duplicate SNP AIMs CG1 CG2 TOTAL Picked 97 112 209 Typed 96 106 202 Typed, Not Dropped 96 103 199 Annotated, Unique Genes per Panel CG1 CG2 TOTAL Typed, Not Dropped SNPs 119 123 230
CWAS - Why ? • CG1 + CG2 = 2882 SNPs • Analyze all SNPS irrespective of genome location or putative gene assignment • Inference of gene function based on genome is crude • SNPs lie in regions with co-located genes where single gene assignment is imprecise and misleading (coordinated gene regulation) • GWAS and multi-candidate gene studies are fast becoming the standard for disease and trait gene mapping publications • Results in press faster
Pheno 1 Pheno 2 MESA CWAS Process Phenotype Class File Eg ECG N phenotypes 1001 12 2.41 4.77 1002 5.32 2.99 1003 6 1.69 4.13 1004 25 3.04 2.87 1001 AA GT AA CC CT 1002 AA GG AT CC CT 1003 AG GG AA CC CC 1004 AA GG AA CT CT Master Genetics File Pheno 1 Pheno 2 Pheno 3 SNP1 -0.93 SNP2 0.87 SNP3 1.10 SNP4 0.97 SNP1 1.22 SNP2 1.02 SNP3 -3.1 SNP4 -0.7 SNP1 1.34 SNP2 1.22 SNP3 0.61 SNP4 0.65 CWAS - 1 per phenotype . . . etc Gene 1 Gene 2 Gene 3 Gene 4 SNP106 SNP107 SNP108 SNP106 SNP107 SNP108 SNP106 SNP107 SNP108 Pheno 1: Candidate Gene Files Pheno 2: Candidate Gene Files Pheno 3: Candidate Gene Files
Phenotype Class Status TOTALS 886 98 *
Analytical Pipeline • Use same curated phenotype and genetic data sets that are available (split into CGs) for investigators • Baseline models, within 4 ethnic group strata • Y ~ age + sex + site • Additive (1 df) + Genotype (2 df) tests • Filter on MAF > 0.05 to remove rare alleles • Full (common + rare) SNP data is available if a CWAS group requests but test statistics may be misleading
What’s in a CWAS Package of Analyses for a Phenotype ? 2 Classes of tests: • Additive (1df) + Genotype (2df) Summary Table of Top N=50 Snps with rankings and summary statistics • Stratified by ethnic group • Sorted by additive model p-value • Includes ranking for each ethnic group (e.g., SNP with ranking #1 for AFA may be #240 for CHN) 4 Quantile (QQ plots) - ie by ethnic stratum for each test 4 Genome Association Plots (GAPs) - by ethnic stratum for each test
CWAS Like GWAS is Fraught with Risk • Interpretation: Caveat lector • Do the test statistics appear reasonable ? • 1 df vs 2 df tests, CIs, Std. Errors etc • Is there evidence of genotyping bias/errors ? • Are allele effects consistent (even if not significant) ? • Are the results confounded (comorbidities or correlated traits) ? • Is the gene(s) reasonable - is there independent evidence of association or gene expression data to support a putative physiological role • Is there sufficient power ?
CWAS To Do • Rerun with genetically determined ancestry adjustment (currently self-report) • Rerun models with missing non-baseline covariates • Lipids: Complete CWAS incorporating multiple imputation of lipid levels adjusted for lipid meds • Run analyses for new phenotype classes and classes with primary outcomes still TBD • Distribute results as per genetics Committee directives • NB Ancillary study groups (eg Lung, Eye) may have separate analysis/writing group policies