420 likes | 898 Views
Genome-wide Association Studies. John S. Witte. Candidate Gene or GWAS. Association Studies. Hirschhorn & Daly, Nat Rev Genet 2005. Affymetrix Array. Genome-wide Association Studies. Altshuler & Clark, Science 2005. Genome-wide Assocation Studies (GWAS). # Markers. # Samples.
E N D
Genome-wide Association Studies John S. Witte
Candidate Gene or GWAS Association Studies Hirschhorn & Daly, Nat Rev Genet 2005
Affymetrix Array Genome-wide Association Studies Altshuler & Clark, Science 2005
# Markers # Samples Discovery: Multi-stage GWAS+ Time GWAS+ Strategy Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+
GWAS+ Strategy # Markers # Samples Discovery: Multi-stage GWAS+ Time Clarification: Sequencing+ Confirmation / Characterization: Follow-up Genotyping+
One- and Two-Stage GWA Designs Two-Stage Design One-Stage Design SNPs SNPs 1,2,3,……………………………,M 1,2,3,……………………………,M 1,2,3,………………………,N 1,2,3,………………………,N samples Stage 1 Samples Samples Stage 2 markers
One-Stage Design SNPs Samples Two-Stage Design Joint analysis Replication-based analysis SNPs SNPs Samples Stage 1 Stage 1 Samples Stage 2 Stage 2
Multistage Designs • Joint analysis has more power than replication • p-value in Stage 1 must be liberal • Lower cost—do not gain power • http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
QC Steps • Filter SNPs and Individuals • MAF, Low call rates • Test for HWE among controls & within ethnic groups. Use conservative alpha-level • Check for relatedness. Identity-by-state calculations.
Analysis of GWAS • Most common approach: look at each SNP one-at-a-time. • Possibly add in multi-marker information. • Further investigate / report top SNPs only. • Or backwards replication… P-values
GWAS Analysis • Most commonly trend test. • Log additive model, logistic regression. • Adjust for potential population stratification.
Example: GWAS of Prostate Cancer chromosome http://cgems.cancer.gov Multiple prostate cancer loci on 8q24 Witte, Nat Genet 2007
Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
SNPs Missed in Replication? 24,223 smallest P-value! Witte, Nat Rev Genet, 2009
Prostate Cancer www.genome.gov/gwastudies Manolio et al. Clin Invest 2008
Population Attributable Risks for GWAS Smoking & lung cancer BRCA1 & Breast cancer Jorgenson & Witte, 2009
Limitations of GWAS Example: AUC for Br Cancer Risk Gail = 58% SNPs = 58.9% G + S = 61.8% Wacholder et al. NEJM 2010 • Not very predictive Witte, Nat Rev Genet 2009
Limitations of GWAS • Not very predictive • Explain little heritability • Focus on common variation • Many associated variants are not causal
Where’s the Heritability? Common disease rare variant (CDRV) hypothesis: diseases due to multiple rare variants with intermediate penetrances (allelic heterogeneity) Many more of these? See: NEJM, April 30, 2009 McCarthy et al., 2008
Will GWAS results explain more heritability? • Possibly, if… • Causal SNPs not yet detected due to power / practical issues (e.g., not yet included in replication studies). • Stronger effects for causal SNPs: Associated SNP may only serve as a marker for multiple different causal SNPs.
Imputation of SNP Genotypes • Estimate unmeasured or missing genotypes. • Based on measured SNPs and external info (e.g., haplotype structure of HapMap). • Increase GWAS power. • Allow for combining data across different platforms (e.g., Affy & Illumina) (for replication / meta-analysis).
Imputation Example Study Sample HapMap/ 1K genomes Gonçalo Abecasis
Identify Match with Reference Gonçalo Abecasis
Phase chromosomes, impute missing genotypes Gonçalo Abecasis http://www.sph.umich.edu/csg/abecasis/MACH
Imputation Application TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red. Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software
Genome-wide Sequence Studies • Trade off between number of samples, depth, and genomic coverage. Goncalo Abecasis
Near-term Design Choices • For example, between: • Sequencing few subjects with extreme phenotypes: • e.g., 200 cases, 200 controls, 4x coverage. Then follow-up in larger population. • 10M SNP chip based on 1,000 genomes. • 5K cases, 5K controls. • Which design will work best…?
Many weak associations combine to risk? Score model: where ln(ORi ) = ‘score’ for SNPi from ‘discovery’ sample SNPij = # of alleles (0,1,2) for SNPi, person j in ‘validation’ sample. Large number of SNPs (m) xj associated with disease? Polygenic Models ISC / Purcell et al. Nature 2009
Application of Model Purcell / ISC et al. Nature 2009
Application to CGEMs PCa GWAS Witte & Hoffman 2010 • 1,172 cases, 1,157 controls from PLCO Trial • Oversampled more aggressive cases. • Illumina 550K array. • PCa & stratified by disease aggressiveness. • Split into halves, resampling: • one as ‘discovery’ sample; • other as ‘validation’. • LD filter: r2 = 0.5.
Common Polygenic Model for Prostate and Breast Cancer? • CGEMs GWAS data on prostate and breast cancer. • Use one cancer as ‘discovery’ sample, the other as ‘validation’. Nat Rev Cancer 2010;10:205-212
Complex diseases Physical activity Genetic susceptibility Obesity Hyperlipidemia Diet Diabetes Complex diseases: Many causes = many causal pathways! Vulnerable plaques Hypertension MI Atherosclerosis
Pathways • Many websites / companies provide ‘dynamic’ graphic models of molecular and biochemical pathways. • Example: BioCarta: http://www.biocarta.com/ • May be interested in potential joint and/or interaction effects of multiple genes in one pathway.
Systems Biology Moving Beyond Genome Transcriptome: All messenger RNA molecules (‘transcripts’) Proteome: All proteins in cell or organism Metabolome: all metabolites in a biological organism (end products of its gene expression).