1 / 33

Topic #3 Linkage Disequilibrium, Haplotypes & Tagging

Topic #3 Linkage Disequilibrium, Haplotypes & Tagging. University of Wisconsin Genetic Analysis Workshop June 2011. Overview. Fate of a new mutation Linkage Disequilibrium (LD) Measurement Indirect association SNP selection based on LD Haplotypes SNP selection by tagging

molimo
Download Presentation

Topic #3 Linkage Disequilibrium, Haplotypes & Tagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic #3Linkage Disequilibrium, Haplotypes & Tagging University of Wisconsin Genetic Analysis Workshop June 2011

  2. Overview • Fate of a new mutation • Linkage Disequilibrium (LD) • Measurement • Indirect association • SNP selection based on LD • Haplotypes • SNP selection by tagging • Practical – SNP selection using Haploview

  3. Introduction of a Mutation into a Population TIME

  4. Introduction of a Mutation into a Population 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 TIME 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1 1 2 2 1 1 2 1 1 2 2 1 2 2 1 1 2 2 2 1 2 2 1 1 1 1 2 1 2 2 2 1 2 2 2 1 1 1 2 1

  5. Haplotype Concept • The sequence 111212 in this location becomes a signature for the chromosome carrying the mutation • Haplotype – alleles inherited together at linked loci on the same chromosome • 111212 haplotype will not be a perfect marker of disease • At the time mutation arose, there may have been other chromosomes with 111212 • New mutations • Recombination

  6. Indirect Association • Each of the alleles in the 111212 haplotype is also expected to be indirectly associated with carrying the mutation. • Indirect association is an association of a marker with phenotype that is non-causal, being based on linkage disequilibrium (LD)

  7. Linkage Disequilibrium (LD) • Mendel’s Second Law: alleles at different loci assort independently • Linkage Disequilibrium (LD): population-level association of alleles at linked loci

  8. How LD is Measured A locus: A1 or A2 B locus: B1 or B2 LD – population-level association between linked loci Let P(A1) = pA1 Let P(B1) = pB1 Let P(A1B1) = pA1B1 D = pA1B1 - pA1pB1 = 0 if independent

  9. Common LD Measures • D = |d| • Preferred measure for population geneticists • Maximum value is bounded by the marginals • D’ = |d|/dmax • D’ varies between 0 and 1 • Does not have an easy interpretation and 1.0 is achieved if one off-diagonal is zero • r2 ( D2) = D2/p(1-p)q(1-q) • Has several interpretations: • = squared (phi) correlation so lies in [0,1]. • = c2/N • Directly related to power for indirect association

  10. Allelic Association • Direct Association • Initially it was thought that we could pick the genes and the (single) genetic variant w/i each gene that was relevant for disease • Indirect Association • The existence of LD opens up the possibility of tests by indirect association – we don’t need to actually test the causal variant but rather need only genotype a marker that is in high LD with the causal variant

  11. Indirect and Direct Allelic Association Direct Association Indirect Association D D M3 M1 M2 Assess relationship of D locus indirectly by determining whether markers (Mi) are associated with disease – Midon’t need to be functional Assess relationship of D locus to phenotype directly – expect D to be a functional polymorphism in a candidate gene

  12. Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394

  13. Dawson, E. et al. (2002). A first-generation LD map of 22. Nature 418: 544-547

  14. Population Differences Weiss, K.M & Clark, A.G. (2002). Trends in Genetics, 18(1):19-24.

  15. Recombination Hotspots Hotspots typically span 1-2 kb Kauppi, L., Jeffreys, A. J., & Keeney, S. (2004). Where the crossovers are: Recombination distributions in mammals. Nature Reviews Genetics, 5, 413-424

  16. Haplotype Blocks

  17. Two- and Three-locus Haplotypes APOE locus and haplotypes containing APOE Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394

  18. Two- and Three-locus Haplotypes 3-locus haplotype stronger signal than individual markers Martin, E.R. et al. (2000). SNPing away at complex disease … AJHG 67: 383-394

  19. SNP Selection by Tagging • Basic rationale: • The power for a causal SNP in a sample of size N is equivalent to power of tagging SNP in a sample of size N/r2 • Tagging SNP selection: • Based on some reference sample (HapMap) • Two overarching strategies • Pairwise tagging • Multimarker tagging de Bakker, P. I. W., et al. (2005). Efficiency and power in genetic association studies. Nature Genetics, 37(11), 1217-1223.

  20. Reference Sample: HapMap(www.hapmap.org) • HapMap Phase 1: • SNP Selection Strategy (yield ~ 1 million): • >1 common SNP every 5 kb, total of 1.3 million before QC • MAF > .05 • Some priority for non-synonymous cSNPs • Sample: N=270 (269) individuals from 4 populations • 30 trios of Europeans from Utah (CEU) • 45 unrelated Han Chinese (CHB) • 45 unrelated Japanese (JPT) • 30 Yoruban trios from Nigeria (YRI)

  21. Reference Sample: HapMap(www.hapmap.org) • Phase 2: • 2.1 million additional SNPs • Total now averages ~ 1/per kb; >98% of common variants w/i 5kb • Focus still on MAF > .05 • Average max r2 of untyped common SNPs to a typed SNP

  22. Reference Sample: HapMap(www.hapmap.org) • Phase 3: • Expand to N=1115 in 11 ancestral groups 2.1 million additional SNPs * Sample consists of family triples

  23. HAPMAP3, Release 2 Region in NCBI B36 COMT Phase, Release and Build

  24. HapMap Genotyped SNPs in COMT

  25. Using Haploview to Identify Tagging SNPs for COMT • Download Data from HapMap • Choose HapMap Download, Phase 3, and Release 2 • Choose population • Choose chromosome (22) and region (NCBI B36/hg18) • Transcription starts at 18309; I will start at 18304 • Transcription ends at 18337; I will end at 18340 • Haploview Analysis • Get LD plot • Run Tagger (pairwise) • Force include/exclude

  26. COMT LD Plot (D’)

  27. COMT LD Plot (r2)

  28. COMT Tagging SNPs (15 tag 24 at avg r2 = .996)

  29. LD Plot Available from SNPInfo (http://manticore.niehs.nih.gov/)

  30. Conclusions • Alleles at linked loci tend to be inherited together, a phenomenon known as linkage disequilibrium (LD) • Because recombination is not uniform, the genome has a “block-like” structure – haplotype • You do not need to have the “causal variant” in your genotyped set if it is adequately tagged • A major strategy for SNP selection is to ensure adequate coverage (r2 > .8) of common genetic variants in a gene, which can be done with Haploview

More Related