730 likes | 1.16k Views
Genome-wide association studies (GWAS). Thomas Hoffmann. Outline for GWAS. Review / Overview Design Analysis QC Prostate cancer example Imputation Replication & Meta-analysis Advanced analysis intro (more next lecture) Limitations & “missing heritability” Gene/pathway tests
E N D
Genome-wide association studies (GWAS) Thomas Hoffmann
Outline for GWAS • Review / Overview • Design • Analysis • QC • Prostate cancer example • Imputation • Replication & Meta-analysis • Advanced analysis intro (more next lecture) • Limitations & “missing heritability” • Gene/pathway tests • Polygenic models
Outline for GWAS • Review / Overview • Design • Analysis • QC • Prostate cancer example • Imputation • Replication & Meta-analysis • Advanced analysis intro (more next lecture) • Limitations & “missing heritability” • Gene/pathway tests • Polygenic models
Candidate Gene or GWAS Recap: Association studies (guilt by association) Hirschhorn & Daly, Nat Rev Genet 2005
GWAS Microarray Assay ~ 0.7 - 5M SNPs (keeps increasing) Affymetrix, http://www.affymetrix.com
Genotype calls Bad calls! Good calls!
Outline for GWAS • Review / Overview • Design • Analysis • QC • Prostate cancer example • Imputation • Replication & Meta-analysis • Advanced analysis intro (more next lecture) • Limitations & “missing heritability” • Gene/pathway tests • Polygenic models
One- and two-stage GWA designs Two-Stage Design One-Stage Design SNPs SNPs nsamples Stage 1 Samples Samples Stage 2 nmarkers
One-Stage Design SNPs Samples Two-Stage Design Joint analysis Replication-based analysis SNPs SNPs 1 1 Stage 1 Stage 1 Samples Samples Stage 2 Stage 2 2 2
Multistage Designs • Joint analysis has more power than replication • p-value in Stage 1 must be liberal • Lower cost—do not gain power • CaTs power calculator: http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
Genome-wide Sequence Studies • Trade off between number of samples, depth, and genomic coverage. More later in “Next generation sequencing” (NGS) lecture Goncalo Abecasis
Near-term sequencing design choices • For example, between: • Sequencing few subjects with extreme phenotypes: • e.g., 200 cases, 200 controls, 4x coverage. Then follow-up in larger population. • 10M SNP chip based on 1,000 genomes. • 5K cases, 5K controls. • Which design will work best…? More later in “Next generation sequencing” (NGS) lecture
GWAS Microarray Only assay SNPs designed into array (0.7-5 million) Much cheaper (so many more subjects) Genotypes currently more reliable GWAS Sequencing “De novo” discovery (particularly good for rare variants) More expensive (but costs are falling) (many less subjects) Need much more expansive IT support Lots of interesting interpretation problems (field rapidly evolving) Design choices
Exome Microarray Only assay SNPs designed into array (~300K+custom); in exons only and that could affect protein coding function Cheapest (so many more subjects) Genotypes currently more reliable (some question about rarest, but preliminary results good) Exome Sequencing “De novo” discovery (particularly good for rare variants); %age of exons only More expensive than microarrays, less expensive than gwas sequencing Need more expansive IT support Lots of interesting interpretation problems Design choices
Size of study Visscher, AJHG 2012,
Size of study Visscher, AJHG 2012,
Biggest studies... • GWAS Microarray: 100,000 People in the Kaiser RPGEH, still to be analyzed (Hoffmann et al., Genomics, 2011a&b) • Sequencing: 1000 Genomes Project (though not disease focused, low coverage issues) • Exome Sequencing: GO ESP (12,031 subjects, for exome microarray design)
Outline for GWAS • Review / Overview • Design • Analysis • QC • Prostate cancer example • Imputation • Replication & Meta-analysis • Advanced analysis intro (more next lecture) • Limitations & “missing heritability” • Gene/pathway tests • Polygenic models
QC Steps • Filter SNPs and Individuals • MAF, Low call rates • Test for HWE among controls & within ethnic groups. Use conservative alpha-level. • Check for relatedness. Identity-by-state calculations. • Check genotype gender • Filter Mendelian inhertance (family-based, or potentially cryptics, if large enough sample)
Check for relatedness, e.g., HapMap Pemberton et al., AJHG 2010
GWAS analysis • Most common approach: look at each SNP one-at-a-time. • Possibly add in multi-marker information. • Further investigate / report top SNPs only. • Or backwards replication… P-values
GWAS analysis • Additive coding of SNP most common, just a covariate in a regression framework • Dichotomous phenotype: logistic regression • Continuous phenotype: linear regression • Correct for multiple comparisons • e.g., Bonferroni, 1 million gives =5x10-8 • more next time • Adjust for potential population stratification • principal components (PC’s), on best performing SNPs • software usually does LD filter (e.g. Eigensoft)
Adjusting for PC’s (recap) Balding, Nature Reviews Genetics 2010
Adjusting for PC's • Li et al., Science 2008
Adjusting for PC's • Razib, Current Biology 2008
Adjusting for PC's • Wang, BMC Proc 2009
QQ-plots and PC adjustment Wang, BMC Proc 2009
Example: GWAS of Prostate Cancer chromosome http://cgems.cancer.gov Multiple prostate cancer loci on 8q24 Witte, Nat Genet 2007
Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs
SNPs Missed in Replication? 24,223 smallest P-value! Witte, Nat Rev Genet, 2009
Population Attributable Risks for GWAS Smoking & lung cancer BRCA1 & Breast cancer Jorgenson & Witte, 2009
Imputation of SNP Genotypes • Combine data from different platforms (e.g., Affy & Illumina) (for replication / meta-analysis). • Estimate unmeasured or missing genotypes. • Based on measured SNPs and external info (e.g., haplotype structure of HapMap). • Increase GWAS power (impute and analyze all), e.g. Sick sinus syndrome, most significant was 1000 Genomes imputed SNP (Holm et al., Nature Genetics, 2011) • HapMap as reference, now 1000 Genomes Project?
Imputation Example Study Sample HapMap/ 1K genomes • http://www.shapeit.fr/, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html • http://faculty.washington.edu/browning/beagle/beagle.html • http://www.sph.umich.edu/csg/abecasis/MACH/download/ Gonçalo Abecasis
Identify Match with Reference • http://www.shapeit.fr/, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html • http://faculty.washington.edu/browning/beagle/beagle.html • http://www.sph.umich.edu/csg/abecasis/MACH/download/ Gonçalo Abecasis
Phase chromosomes, impute missing genotypes • http://www.shapeit.fr/, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html • http://faculty.washington.edu/browning/beagle/beagle.html • http://www.sph.umich.edu/csg/abecasis/MACH/download/ Gonçalo Abecasis
Imputation Application TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red. Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software
Replication • To replicate: • Association test for replication sample significant at 0.05 alpha level • Same mode of inheritance • Same direction • Sufficient sample size for replication • Non-replications not necessarily a false positive • LD structures, different populations (e.g., flip-flop) • covariates, phenotype definition, underpowered
Meta-analysis • Combine multiple studies to increase power • Either combine p-values (Fisher’s test), • or z-scores (better)
(Meta-analysis)Example: GWAS of Prostate Cancer chromosome http://cgems.cancer.gov Multiple prostate cancer loci on 8q24 Witte, Nat Genet 2007
Outline for GWAS • Review / Overview • Design • Analysis • QC • Prostate cancer example • Imputation • Replication & Meta-analysis • Advanced analysis intro (more next lecture) • Limitations & “missing heritability” • Gene/pathway tests • Polygenic models
Limitations of GWAS Example: AUC for Breast Cancer Risk 58%: Gail model (# first degree relatives w bc, age menarche, age first live birth, number of previous biopsies) + age, study, entry year 58.9%: SNPs 61.8%: Combined Wacholder et al., NEJM 2010 • Not very predictive Witte, Nat Rev Genet 2009
Limitations of GWAS • Not very predictive • Explain little heritability • Focus on common variation • Many associated variants are not causal
Where's the heritability? Visccher, AJHG 2011