560 likes | 856 Views
Genome-wide Studies: Association. Genome-wide Association Studies. 1. History Linkage vs. Association Power/Sample Size 2. SNPs and The International HapMap Project 3. Direct vs. Indirect Association Using Linkage Disequilibrium to reduce genotyping
E N D
Genome-wide Association Studies • 1. History • Linkage vs. Association • Power/Sample Size • 2. SNPs and The International HapMap Project • 3. Direct vs. Indirect Association • Using Linkage Disequilibrium to reduce genotyping • 4. SNP selection, Coverage, Study Designs • 5. Genotyping Platforms • 6. Early (recent) GWA Studies
Gene Mapping Study Designs • Positional Cloning • Linkage Analysis • Linkage Disequilibrium based Fine Mapping • Candidate Gene Association • Need a biological hypothesis
Risch and Merikangas 1996 Sample Size Association < Sample Size for Linkage
What Risch and Merikangas proposed: • 5 genetic polymorphisms per gene • 100,000 genes (1996) • = 500,000 genotypes per subject • Candidate Gene Study Design • All genes are candidates • Direct or Sequence-based approach • Causal variant is one of the variants tested
Sample Size Required • Linkage Analysis with affected sib pairs • Transmission Disequilbrium Test (TDT) • TDT with affected sib pairs
Affected Sib Pair Linkage Analysis • 2 siblings/family • Both sibs affected • IBD at the marker locus • Expect 50% marker sharing on average
Identity By Descent Sibling 1 A A 2 1 1 0 A A a A A a a a
Identity By Descent Expected number of alleles IBD is = 2*25% + 1*50% + 0*25% = 1 allele = 50% sharing
Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required
Sample Size Calculation Exposure Frequency Effect Size Identity By Descent (IBDM) Sample Size Required High IBD sharing Low IBD sharing
TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2
TDT Transmitted alleles vs. non-transmitted alleles TDT = (n12 - n21)2 (n12 + n21) Asymptotically c2 with 1 degree of freedom
TDT Transmitted alleles vs. non-transmitted alleles M1 M2 M2 M2 M1M2
TDT For this one Trio: TDT = (1 - 0)2 (1 + 0) p-value = 0.32 = 1
TDT For one hundred Trios: TDT = (50 - 45)2 (50 + 45) p-value = 0.01 = 6.58
Conclusions • Linkage • Good for Large Effect Sizes • Genome-wide Association • Good for Modest Effect Sizes
Two Hypotheses • Common Disease-Common Variant • Common variants • Small to modest effects • Rare Variant • Rare variants • Larger effects
How many common variants are there? • Millions of SNPs • Microsatellites, Insertions/Deletions, Copy Number Polymorphisms, Inversions • Numbers depend on: • Population under study • Minimum allele frequency • 5% is “common” • Less than 5% requires very large samples
The Number of SNPs in the Human Genome (HapMap) 1This is an underestimate.
Multiple testing and genotyping “The number of tests we have used as the basis for our calculations (1,000,000) is likely to be far larger than necessary if one allows for linkage disequilibrium, which could substantially reduce the required number of markers and families needed for initial screening.”
Coverage • Percent of all SNPs captured by genotyped SNPs • More genotyped SNPs = better coverage
Linkage Disequilibrium A B a b A b a B
Measuring Coverage Maximum r2 0.8 0.6 0.2 Genotyping Set Complete Set
Indirect Association and LD • Sample size required for Direct Association, n • Sample size for Indirect Association = n/ r2 • For r2 = 0.8, increase is 25% • For r2 = 0.5, increase is 100%
The HapMap Project • Initial Goal: • 600,000 SNPs for indirect association studies • LD information between SNPs • Phase 1: 1 million SNPs • Phase 2: additional 2.9 million SNPs
HapMap • SNPs from dbSNP were genotyped • Looked for 1 every 5kb • SNP Validation • Polymorphic • Frequency • Linkage Disequilibrium Estimation • LD tagging SNPs
HapMap • 270 subjects • 45 Chinese • 45 Japanese • 90 Yoruban and 90 European-American • 30 Trios • 2 parents, 1 child
Number of SNPs needed to capture all SNPs • Depends on: • Population studied • Minor allele frequency of causal SNP • Level of LD (r2) used as a cutoff • For Example: • Caucasians, Asians, Africans • Minor Allele Frequency ≥ 5% • r2 ≥ 0.8
The Number of SNPs in the Human Genome (extrapolating from the HapMap) This is an underestimate.
Genotyping Platforms • Affymetrix 500K • Pseudo-random SNPs • Illumina 550K, 650K • HapMap-based LD-tagging SNPs • r2 ≥ 0.8 for some SNPs, ≥ 0.7 for others • Parallele 20K • Nonsynonymous SNPs
One- and Two-Stage GWA Designs Two-Stage Design One-Stage Design SNPs SNPs 1,2,3,……………………………,M 1,2,3,……………………………,M 1,2,3,………………………,N 1,2,3,………………………,N samples Stage 1 Samples Samples Stage 2 markers
One-Stage Design SNPs Samples Two-Stage Design Joint analysis Replication-based analysis SNPs SNPs Samples Stage 1 Stage 1 Samples Stage 2 Stage 2
Multistage Designs • Joint analysis has more power than replication • p-value in Stage 1 must be liberal • Lower cost—do not gain power • http://www.sph.umich.edu/csg/abecasis/CaTS/index.html
GWA studies have been published • Myocardial Infarction • Gene-based SNPs • Age related Macular Degeneration • Affymetrix 500K • Parkinson’s Disease • Perlegen 198K chip • 1,793 SNPs in second stage
Macular Degeneration • Small Sample—96 cases, 50 controls • Sparse SNP set • Under a previous linkage peak • Missed other loci