1 / 38

Give me your DNA and I tell you where you come from - and maybe more!

Give me your DNA and I tell you where you come from - and maybe more!. Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann.

dpena
Download Presentation

Give me your DNA and I tell you where you come from - and maybe more!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Give me your DNA and I tell you where you come from - and maybe more! Sven Bergmann University of Lausanne & Swiss Institute of Bioinformatics http://serverdgm.unil.ch/bergmann Lausanne, Genopode 21 April 2010

  2. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Future directions

  3. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Future directions

  4. Phenotypes Genotypes 159 measurement 144 questions 500.000 SNPs CoLaus = Cohort Lausanne 6’189 individuals Collaboration with:Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV)

  5. ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… ATTGCAATCCGTGG...ATCGAGCCA…TACGATTGCACGCCG… ATTGCAAGCCGTGG...ATCTAGCCA…TACGATTGCAAGCCG… Genetic variation in SNPs (Single Nucleotide Polymorphisms)

  6. Analysis of Genotypes only Principle Component Analysis reveals SNP-vectors explaining largest variation in the data

  7. Ethnic groups cluster according to geographic distances PC2 PC2 PC1 PC1

  8. PCA of POPRES cohort

  9. Predicting location according to SNP-profile ...

  10. … is pretty accurate!

  11. The Swiss segregate according to language

  12. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Future directions

  13. Phenotypic variation:

  14. What is association? SNPs trait variant chromosome Genetic variation yields phenotypic variation Population with ‘ ’ allele Population with ‘ ’ allele Distributions of “trait”

  15. Association using regression phenotype genotype Coded genotype

  16. effect size (regression coefficient) (monotonic) transformation error (residual) p(β=0) phenotype (response variable) of individual i coded genotype(feature) of individual i Regression formalism Goal: Find effect size that explains best all (potentially transformed) phenotypesas a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)

  17. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Future directions

  18. Whole Genome Association

  19. Whole Genome Association Current microarrays probe ~1M SNPs! Standard approach: Evaluate significance for association of each SNP independently: significance

  20. Whole Genome Association Manhattan plot Quantile-quantile plot observedsignificance significance Chromosome & position Expected significance • GWA screens include large number of statistical tests! • Huge burden of correcting for multiple testing! • Can detect only highly significant associations (p < α / #(tests) ~ 10-7)

  21. Current insights from GWAS: • Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations • Many of these associations have been replicated in independent studies

  22. Current insights from GWAS: • Each locus explains but a tiny (<1%) fraction of the phenotypic variance • All significant loci together explain only a small (<10%) of the variance David Goldstein: “~93,000 SNPs would be required to explain 80% of the population variation in height.” Common Genetic Variation and Human Traits, NEJM 360;17

  23. So what do we miss? • Other variants like Copy Number Variations or epigenetics may play an important role • Interactions between genetic variants (GxG) or with the environment (GxE) • Many causal variants may be rare and/or poorly tagged by the measured SNPs • Many causal variants may have very small effect sizes • Overestimation of heritabilities from twin-studies?

  24. Overview • Population stratification • Associations: Basics • Whole genome associations • Genotype imputation • Future directions

  25. Genotypes are called with varying uncertainty Intensity of Allele A Intensity of Allele G

  26. Some Genotypes are missing at all …

  27. … but are imputed with different uncertainties

  28. … using Linkage Disequilibrium! D 1 2 3 n Marker LD Markers close together on chromosomes are often transmitted together, yielding a non-zero correlation between the alleles.

  29. Two easy ways dealing with uncertain genotypes • Genotype Calling:Choose the most likely genotype and continue as if it is true(p11=10%, p12=20% p22=70% => G=2) • Mean genotype: Use the weighted average genotype(p11=10%, p12=20% p22=70% => G=1.6)

  30. Overview • Associations: Basics • Whole genome associations • Population stratification • Genotype imputation • Uncertain genotypes • Future directions

  31. Organisms Biological Insight Developmental Physiological Environmental Data types ? Experimental Clinical Conditions The challenge of many datasets: How to integrate all the information? • Protein expression • Tissue specific expression • Interaction data • Genotypic data • Epigenetic data …

  32. Network Approaches for Integrative Association Analysis Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions

  33. Modular Approach for Integrative Analysis of Genotypes and Phenotypes Individuals Modular links Phenotypes Measurements SNPs/Haplotypes Genotypes

  34. Take-home Messages: • Analysis of genome-wide SNP data reveal population structure mirrors geography • Genome-wide association studies reveal candate loci for a multitude of traits, but have little predictive power so far • Future improvement will require • better genotyping (CGH, UHS, …) • New analysis approaches (interactions, networks, data integration)

More Related