1 / 55

Genome-wide association studies (GWAS)

Genome-wide association studies (GWAS). Thomas Hoffmann. Outline. GWAS Overview Design Microarray Sequencing Which to use and other censiderations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond.

olive
Download Presentation

Genome-wide association studies (GWAS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genome-wide association studies (GWAS) Thomas Hoffmann

  2. Outline GWAS Overview Design Microarray Sequencing Which to use and other censiderations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  3. Manolio et al., Clin Invest 2008

  4. Candidate Gene or GWAS Genetic association studies (guilt by association) Hirschhorn & Daly, Nat Rev Genet 2005

  5. GWAS Microarray Assay ~ 0.7 - 5M SNPs (keeps increasing) Affymetrix, http://www.affymetrix.com

  6. Genotype calls Bad calls! Good calls!

  7. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  8. Genome-wide assocation studies (GWAS)

  9. One- and two-stage GWA designs Two-Stage Design One-Stage Design SNPs SNPs nsamples Stage 1 Samples Samples Stage 2 nmarkers

  10. One-Stage Design SNPs Samples Two-Stage Design Joint analysis Replication-based analysis SNPs SNPs 1 1 Stage 1 Stage 1 Samples Samples Stage 2 Stage 2 2 2

  11. Multistage Designs • Joint analysis has more power than replication • p-value in Stage 1 must be liberal • Lower cost—do not gain power • CaTs power calculator: http://www.sph.umich.edu/csg/abecasis/CaTS/index.html

  12. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  13. Genome-wide Sequence Studies • Trade off between number of samples, depth, and genomic coverage. Goncalo Abecasis

  14. Near-term sequencing design choices • For example, between: • Sequencing few subjects with extreme phenotypes: • e.g., 200 cases, 200 controls, 4x coverage. Then follow-up in larger population. • 10M SNP chip based on 1,000 genomes. • 5K cases, 5K controls. • Which design will work best…?

  15. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  16. Design choices GWAS Microarray Only assay SNPs designed into array (0.7-5 million) Much cheaper (so many more subjects) GWAS Sequencing “De novo” discovery (particularly good for rare variants) More expensive (but costs are falling) (many less subjects) Need much more expansive IT support Lots of interesting interpretation problems (field rapidly evolving)

  17. Design choices Exome Microarray Only assay SNPs designed into array (~300K+custom); in exons only and that could affect protein coding function Cheapest (so many more subjects) Exome Sequencing “De novo” discovery (particularly good for rare variants); %age of exons only More expensive than microarrays, less expensive than gwas sequencing Need more expansive IT support Lots of interesting interpretation problems

  18. Size of study Visscher, AJHG 2012,

  19. Size of study Visscher, AJHG 2012,

  20. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  21. QC Steps Remove SNPs with low call rate (e.g., <97%) Proportion of SNPs actually called by software If it's low, the clusters aren't well defined, artifacts Remove those with low minor allele frequency? Rarer variants more likely artifacts / underpowered Exome arrays – rare variants are the whole point! Remove SNPs / Individuals who have too much missing data

  22. QC Steps (2) SNPs that fail Hardy-Weinberg Suppose a SNP with alleles A and B has allele frequency of p. If random matting, then AA has frequency p*p AB has frequency 2*p*(1-p) BB has frequency (1-p)*(1-p) Test for this (e.g., chi-squared test) In practice do for homogeneous populations (more later)

  23. QC Steps • Check genotype gender • Filter Mendelian inhertance (family-based, or potentially cryptics, if large enough sample) • Check for relatedness...

  24. Check for relatedness, e.g., HapMap Pemberton et al., AJHG 2010

  25. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  26. GWAS analysis • Most common approach: look at each SNP one-at-a-time • Additive coding of SNP most common, e.g., # of A alleles • Just a covariate in a regression framework • Dichotomous phenotype: logistic regression • Continuous phenotype: linear regression • {BMI}=B1{SNP}+ B2{Age}+... • Further investigate / report top SNPs only • Adjust for population stratification... P-values

  27. What is population stratification? Balding, Nature Reviews Genetics 2010

  28. Adjusting for PC's • Li et al., Science 2008

  29. Adjusting for PC's • Razib, Current Biology 2008

  30. Adjusting for PC's • Wang, BMC Proc 2009

  31. Aside: “random” mating? Sebro, Gen Epi, 2010

  32. Multiple comparison correction • If you conduct 20 tests at =0.05, one true by chance http://xkcd.com/882/. If you conduct 1 million tests... • Correct for multiple comparisons • e.g., Bonferroni, 1 million gives =5x10-8

  33. QQ-plots and PC adjustment Wang, BMC Proc 2009

  34. Example: GWAS of Prostate Cancer chromosome http://cgems.cancer.gov Multiple prostate cancer loci on 8q24 Witte, Nat Genet 2007

  35. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  36. Imputation of SNP Genotypes • Combine data from different platforms (e.g., Affy & Illumina) (for replication / meta-analysis). • Estimate unmeasured or missing genotypes. • Based on measured SNPs and external info (e.g., haplotype structure of HapMap). • Increase GWAS power (impute and analyze all), e.g. Sick sinus syndrome, most significant was 1000 Genomes imputed SNP (Holm et al., Nature Genetics, 2011) • HapMap as reference, now 1000 Genomes Project?

  37. Imputation Example Li et al., Ann Rev Genom Human Genet, 2009

  38. Imputation Example Li et al., Ann Rev Genom Human Genet, 2009

  39. Imputation Application TCF7L2 gene region & T2D from the WTCCC data Observed genotypes black Imputed genotypes red. Chromosomal Position Marchini Nature Genetics2007 http://www.stats.ox.ac.uk/~marchini/#software

  40. Outline GWAS Overview Design Microarray Sequencing Which to use and other considerations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

  41. Replication To replicate: Association test for replication sample significant at 0.05 alpha level Same mode of inheritance Same direction Sufficient sample size for replication Non-replications not necessarily a false positive LD structures, different populations (e.g., flip-flop) covariates, phenotype definition, underpowered

  42. Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

  43. Prostate Cancer Replications Witte, Nat Rev Genet 2009 Modest ORs

  44. SNPs Missed in Replication? 24,223 smallest P-value! Witte, Nat Rev Genet, 2009

  45. Meta-analysis Combine multiple studies to increase power Either combine p-values (Fisher’s test), or z-scores (better)

  46. (Meta-analysis)Example: GWAS of Prostate Cancer chromosome http://cgems.cancer.gov Multiple prostate cancer loci on 8q24 Witte, Nat Genet 2007

  47. Replication & Meta-analysis

  48. Meta-analysis

  49. Outline GWAS Overview Design Microarray Sequencing Which to use and other censiderations QC Analysis Population stratification adjustment Imputation Replication & Meta-analysis Limitations, missing heritability, and beyond

More Related