1 / 50

Genome-wide Association Studies: Experimental Design & Analysis

This BSc course provides an overview of experimental design and analysis methods used in genome-wide association studies (GWAS), including topics such as population stratification, whole genome associations, genotype imputation, and new methods. The course also covers phenotypic variation, association analysis using regression, and the challenges of whole genome association studies. Students will gain a comprehensive understanding of GWAS and its applications.

geraldhall
Download Presentation

Genome-wide Association Studies: Experimental Design & Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BSc Course: "Experimental design“Genome-wide Association StudiesSven BergmannDepartment of Medical GeneticsUniversity of LausanneRue de Bugnon 27 - DGM 328 CH-1005 Lausanne Switzerlandwork: ++41-21-692-5452cell: ++41-78-663-4980 http://serverdgm.unil.ch/bergmann

  2. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Uncertain genotypes • New Methods

  3. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Uncertain genotypes • New Methods

  4. Phenotypes Genotypes 159 measurement 144 questions 500.000 SNPs CoLaus = Cohort Lausanne 6’189 individuals Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

  5. ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… Genetic variation in SNPs (Single Nucleotide Polymorphisms)

  6. Analysis of Genotypes only Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

  7. Example: 2PCs for 3d-data Raw data points: {a, …, z} http://ordination.okstate.edu/PCA.htm

  8. Example: 2PCs for 3d-data Normalized data points: zero mean (& unit std)! http://ordination.okstate.edu/PCA.htm

  9. Example: 2PCs for 3d-data The direction of most variance perpendicular to PCA1 defines PCA2 Most variance is along PCA1 Identification of axes with the most variance http://ordination.okstate.edu/PCA.htm

  10. Ethnic groups cluster according to geographic distances PC2 PC2 PC1 PC1

  11. PCA of POPRES cohort

  12. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Uncertain genotypes • New Methods

  13. Phenotypic variation:

  14. What is association? SNPs trait variant chromosome Genetic variation yields phenotypic variation Population with ‘ ’ allele Population with ‘ ’ allele Distributions of “trait”

  15. Quantifying Significance

  16. T-test t-value (significance) can be translated into p-value (probability)

  17. Association using regression phenotype genotype Coded genotype

  18. Regression analysis “residuals” “intercept” “coefficients” Y “response” X “feature(s)”

  19. effect size (regression coefficient) (monotonic) transformation error (residual) p(β=0) phenotype (response variable) of individual i coded genotype(feature) of individual i Regression formalism Goal: Find effect size that explains best all (potentially transformed) phenotypesas a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

  20. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Uncertain genotypes • New Methods

  21. Whole Genome Association

  22. Whole Genome Association Current microarrays probe ~1M SNPs! Standard approach: Evaluate significance for association of each SNP independently: significance

  23. Whole Genome Association Manhattan plot Quantile-quantile plot observedsignificance significance Chromosome & position Expected significance • GWA screens include large number of statistical tests! • Huge burden of correcting for multiple testing! • Can detect only highly significant associations (p < α / #(tests) ~ 10-7)

  24. GWAS: >20 publications in 2006/2007 Massive!

  25. Genome-wide meta-analysis for serum calcium identifies significantly associated SNPs near the calcium-sensing receptor (CASR) gene Karen Kapur, Toby Johnson, Noam D. Beckmann, Joban Sehmi, Toshiko Tanaka, Zoltán Kutalik, Unnur Styrkarsdottir, Weihua Zhang, Diana Marek, Daniel F. Gudbjartsson, Yuri Milaneschi, Hilma Holm, Angelo DiIorio, Dawn Waterworth, Andrew Singleton, Unnur Steina Bjornsdottir, Gunnar Sigurdsson, Dena Hernandez, Ranil DeSilva, Paul Elliott, Gudmundur Eyjolfsson, Jack M Guralnik, James Scott, Unnur Thorsteinsdotti, Stefania Bandinelli, John Chambers, Kari Stefansson, Gérard Waeber, Luigi Ferrucci, Jaspal S Kooner, Vincent Mooser, Peter Vollenweider, Jacques S. Beckmann, Murielle Bochud, Sven Bergmann

  26. Current insights from GWAS: • Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations • Many of these associations have been replicated in independent studies

  27. Current insights from GWAS: • Each locus explains but a tiny (<1%) fraction of the phenotypic variance • All significant loci together explain only a small (<10%) of the variance David Goldstein: “~93,000 SNPs would be required to explain 80% of the population variation in height.” Common Genetic Variation and Human Traits, NEJM 360;17

  28. The “Missing variance” (Non-)Problem Why should a simplistic (additive) model using incomplete or approximate features possibly explain anything close to the genetic variance of a complex trait? … and it doesn’t have to as long as Genome-wide Association Studies are meant to as an undirected approach to elucidate new candidate loci that impact the trait!

  29. So what do we miss? • Other variants like Copy Number Variations or epigenetics may play an important role • Interactions between genetic variants (GxG) or with the environment (GxE) • Many causal variants may be rare and/or poorly tagged by the measured SNPs • Many causal variants may have very small effect sizes • Overestimation of heritabilities from twin-studies?

  30. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Uncertain genotypes • New Methods

  31. Genotypes are called with varying uncertainty Intensity of Allele A Intensity of Allele G

  32. Some Genotypes are missing at all …

  33. … but are imputed with different uncertainties

  34. … using Linkage Disequilibrium! D 1 2 3 n Marker LD Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.

  35. Conclusion • Genotypic markers are always measured or inferred with some degree of uncertainty • Association methods should take into account this uncertainty

  36. Two easy ways dealing with uncertain genotypes • Genotype Calling:Choose the most likely genotype and continue as if it is true(p11=10%, p12=20% p22=70% => G=2) • Mean genotype: Use the weighted average genotype(p11=10%, p12=20% p22=70% => G=1.6)

  37. Overview • Associations: Basics • Whole genome associations • Population stratification • Genotype imputation • Uncertain genotypes • New Methods

  38. How could our models become more predictive? • Improve measurements:- measure more variants (e.g. by UHS)- measure other variants (e.g. CNVs)- measure “molecular phenotypes” • Improve models:- proper integration of uncertainties- include interactions- multi-layer models

  39. Towards a layered Systems Model We need intermediate (molecular) phenotypes to better understand organismal phenotypes

  40. Organisms Developmental Physiological Environmental Data types Experimental Clinical Conditions The challenge of many datasets: How to integrate all the information? • Genotypic data • Gene expression data • Metabolomics data • Interaction data • Pathways

  41. Network Approaches for Integrative Association Analysis Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

  42. Using expression data from Hapmap panel we computed all 1012 pairwise interactions for several expression values: The strongest interactions do not occur for SNPs with high (marginal) significance! Bioinformatics Advance Access published online on April 7, 2010 Bioinformatics, doi:10.1093/bioinformatics/btq147

  43. Transcription Modules reduce Complexity http://maya.unil.ch: 7575/ExpressionView SB, J Ihmels & N Barkai Physical Review E (2003)

  44. Association of (average) module expression is often stronger than for any of its constituent genes

  45. Overview • Associations: Basics • Whole genome associations • Population stratification • Genotype imputation • Uncertain genotypes • New Methods

  46. Take-home Messages: • Analysis of genome-wide SNP data reveals that population structure mirrors geography • Genome-wide association studies elucidate candidate loci for a multitude of traits, but have little predictive power so far • Future improvement will require • better genotyping (CGH, UHS, …) • New analysis approaches (interactions, networks, data integration)

More Related