1 / 67

Genomewide Association Studies

Genomewide Association Studies. Genomewide Association Studies. 1. History Linkage vs. Association Power/Sample Size 2. Human Genetic Variation: SNPs 3. Direct vs. Indirect Association Linkage Disequilibrium 4. SNP selection, Coverage, Study Designs 5. Genotyping Platforms

paul-tyler
Download Presentation

Genomewide Association Studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomewide Association Studies

  2. Genomewide Association Studies • 1. History • Linkage vs. Association • Power/Sample Size • 2. Human Genetic Variation: SNPs • 3. Direct vs. Indirect Association • Linkage Disequilibrium • 4. SNP selection, Coverage, Study Designs • 5. Genotyping Platforms • 6. Early (recent) GWA Studies

  3. Risch and Merikangas 1996 Sample Size Association < Sample Size for Linkage

  4. Risch and Merikangas 1996

  5. Sample Size Required • Linkage Analysis with affected sib pairs • Transmission Disequilbrium Test (TDT) • TDT with affected sib pairs

  6. Affected Sib Pair Linkage Analysis • 2 siblings/family • Both sibs affected • IBD at the marker locus • Expect 50% on average

  7. Identity By Descent Sibling 1 A A 2 1 1 0 A A a A A a a a

  8. Identity By Descent Expected number of alleles IBD is = 2*25% + 1*50% + 0*25% = 1 allele = 50% sharing

  9. Risch and Merikangas 1996

  10. Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required

  11. Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required High IBD sharing Low IBD sharing

  12. TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2

  13. TDT Transmitted alleles vs. non-transmitted alleles TDT = (n12 - n21)2 (n12 + n21) Asymptotically c2 with 1 degree of freedom

  14. TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2

  15. TDT For this one Trio: TDT = (1 - 0)2 (1 + 0) p-value = 0.32 = 1

  16. TDT For one hundred Trios: TDT = (50 - 45)2 (50 + 45) p-value = 0.01 = 6.58

  17. Risch and Merikangas 1996 TDT

  18. Linkage • Good for Large Effect Sizes • Genomewide Association • Good for Modest Effect Sizes • Not good for rare disease alleles

  19. Two Hypotheses • Common Disease-Common Variant • Common variants • Small to modest effects • Rare Variant • Rare variants • Larger effects

  20. Allele Frequency and Sample Size

  21. GWA Issues • Cost • Sample Size • Effect Size • Disease Allele Frequency • Multiple Testing • SNP selection • How many? • Which SNPs? • Available Genotyping Platforms

  22. Types of Variants • Single Nucleotide Polymorphism (SNP) • Insertion/Deletion (indel) • Microsatellite or Short Tandem Repeat (STR)

  23. What is a SNP? TTCAGTCAGATCCTAGCCC AAGTCAGTCTAGGATCGGG Chromosome 1 TTCAGTCAGATCCCAGCCC Chromosome 2 AAGTCAGTCTAGGGTCGGG SNP

  24. What is an insertion/deletion? TTCAGTCAGATCCTAGCCC AAGTCAGTCTAGGATCGGG Chromosome 1 TTCAGTCAGATCCCTAGCCC Chromosome 2 AAGTCAGTCTAGGGATCGGG Insertion/Deletion

  25. What is an microsatellite? TTCACAGCAGCAGCAGAGCCC AAGTGTCGTCGTCGTCTCGGG Chromosome 1 TTCACAGCAGCAGAGCCC Chromosome 2 AAGTGTCGTCGTCTCGGG 3 vs. 4 trinucleotide repeats

  26. Relative frequency of each type of variant

  27. The Number of SNPs in the Human Genome

  28. How many SNPs? • 6 billion humans • 12 billion chromosomes • 1% frequency SNP • 120 million copies of the minor allele

  29. Ethnic/Racial Variation in SNP frequency

  30. Rare SNPs across populations

  31. How many of these SNPs have we found? • dbSNP: http://www.ncbi.nlm.nih.gov/projects/SNP/ • 10,430,753 SNPs • 4,868,126 are “validated”

  32. What Risch and Merikangas proposed: • 5 genetic polymorphisms per gene • 100,000 genes (1996) • = 500,000 genotypes per subject • Candidate Gene Study Design • All genes are candidates • Direct or Sequence-based approach • Causal variant is one of the variants tested

  33. Direct vs. IndirectSequence-based vs. Map-based

  34. Indirect Association relies on LD Decay • Variants that are close will have high LD • Variants that are far apart will have low LD • Indirect Association is a form of Positional Cloning

  35. LD Decay E(Dt) = D1 * (1-q)t where Dt is the current amount of LD and t is the number of generations If q = 0.5, LD decays at a rate of 50% per generation If q < 0.5, LD decay is slower

  36. LD Decay over time

  37. Observed LD Decay

  38. Linkage Disequilibrium r2 = (pAB*pab – pAb*paB)2 A B pA * pa * pB * pb a b A b a B

  39. Indirect Association and LD • Sample size required for Direct Association, n • Sample size for Indirect Association = n/ r2 • For r2 = 0.8, increase is 25% • For r2 = 0.5, increase is 100%

  40. Coverage • Percent of all SNPs captured by genotyped SNPs • More genotyped SNPs = better coverage

  41. Diminishing Marginal Returns(Wang and Todd 2003) r2 = 0.5 1,500,000 SNPs 600,000 SNPs r2 = 0.8

  42. Number of SNPs needed to capture all SNPs • Depends on: • Population studied • Minor allele frequency of causal SNP • Level of LD (r2) used as a cutoff • 1.4 million selected SNPs for • Caucasians/Asians • 5% and above • r2 = 0.8

  43. The HapMap Project • Initial Goal: • 600,000 SNPs for indirect association • LD information between SNPs • Phase 1: 1 million SNPs • Phase 2: additional 2.9 million SNPs

  44. HapMap • 270 subjects • 45 Chinese • 45 Japanese • 90 Yoruban and 90 European-American • 30 Trios • 2 parents, 1 child

  45. HapMap • SNPs from dbSNP were genotyped • Looked for 1 every 5kb • SNP Validation • Polymorphic • Frequency • Haplotype Estimation • Haplotype tagging SNPs

  46. Haplotype Tagging

  47. Two approaches • Positional cloning • expand LD mapping to entire genome • Tool: HapMap SNPs • Candidate gene or Gene-based • Expand the number of genes to all genes • 25,000 genes • Tools: jSNPs, SeattleSNPs, NIEHSSNPs

  48. Genome-wide Association LD Based Gene Based

  49. Potentially Functional Regions of a Gene cis regulator ? promoter Amino acid coding RNA processing Transcription regulation

  50. Comparison of Gene-based and Positional Cloning Designs • Positional Cloning • Agnostic (no biological knowledge needed) • Regulatory regions • SNP sets currently incomplete • Expensive • Gene-based • Efficient: Less SNPs need to be genotyped • May miss regulatory regions • Not all SNPs are known

More Related