920 likes | 1.09k Views
SNP Discovery and Analysis Application to Association Studies. Mark J. Rieder, PhD Dana Crawford, PhD SeattleSNPs PGA Morehouse University May 2, 2005. Practical Aspects of SNP Association Studies. SNP Discovery: Where do I find SNPs to use in my association studies?
E N D
SNP Discovery and Analysis Application to Association Studies Mark J. Rieder, PhD Dana Crawford, PhD SeattleSNPs PGA Morehouse University May 2, 2005
Practical Aspects of SNP Association Studies • SNP Discovery: Where do I find SNPs to use in my association studies? (e.g. databases, direct resequencing) SNP Selection: How do I choose SNPs that are informative? (i.e. assessing SNP correlation - linkage disequilibrium) • SNP Associations: What analyses can I perform after genotyping these SNPs? (e.g. single SNP data, haplotype data) • SNP Replication/Function: How is function predicted or assessed. (e.g. nonsynonymous SNPs, conserved non-coding regions (CNS) transcription factor binding sites, gene expression)
SeattleSNPs Program for Genomic Applications: Overview Aim 1: To establish a variation discovery resource capable of comprehensive resequencing of candidate genes related to HLBS. Biological Focus: Inflammation Genes and Pathways: Coagulation, Complement, Cytokines Interacting Partners
SeattleSNPs SNPs in Candidate Genes Average Gene Size - 26.5 kb ~ Compare 2 haploid - 1 in 1,200 bp ~130 SNPs (200 bp) - 15,000,000 SNPs ~ 44 SNPs > 0.05 MAF (600 bp) - 6,000,000 SNPs
SeattleSNPs PGA: Candidate Gene SNP Resource • 4.6 Mb in 47 individuals = 216 Mb total sequence • Define sequence diversity - catalogue all SNPs • Select “optimal” tagSNPs sets • Determine haplotype structure • Provide necessary baseline data for association studies
Warfarin Pharmacogenetics Background Warfarin characteristics Pharmacokinetics/Pharmacodynamics Discovery of VKORC1 VKORC1 - SNP Discovery VKORC1 - SNP Selection (tagSNPs) VKORC1 - SNP Testing SNP/Haplotype Inference Haplotype Inference, Testing VKORC1 - SNP Replication/Function
Pharmacogenomics as a Model for Association Studies Clear genotype-phenotype link intervention variable response Pharmacokinetics - 5x variation Quantitative intervention and response drug dose, response time, metabolism rate, etc. Target/metabolism of drug generally known gene target that can be tested directly with response Reduce variability and identify outliers. Prospective testing Personalized Medicine
Very effective rat poison! Warfarin Background • Commonly prescribed oral anti-coagulant • In 2003, 21.2 million prescriptions were written for • warfarin (Coumadin) • Prescribed following MI, atrial fibrillation, stroke, • venous thrombosis, prosthetic heart valve replacement, • and following major surgery • Difficult to determine effective dosage • Narrow therapeutic range • - Monitoring of prothrombin time (INR) - 2.0 - 3.0 • Large inter-individual variation
Ave: 5.2 mg/d n = 186 European-American Add warfarin dose distribution 30x dose variability Patient/Clinical/Environmental Factors • Pharmacokinetic/Pharmacodynamic - Genetic
Warfarin CYP2C9 Epoxide Reductase Inactivation Pharmacokinetic -Carboxylase (GGCX) Warfarin inhibits the vitamin K cycle Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z)
Warfarin Metabolism (Pharmacokinetics) • Major pathway for termination of pharmacologic effect • is through metabolism of S-warfarin in the liver by CYP2C9 • CYP2C9 SNPs alter warfarin metabolism: • CYP2C9*1 (WT) - normal • CYP2C9*2 (Arg144Cys) - low/intermediate • CYP2C9*3 (Ile359Leu) - low • CYP2C9 alleles occur at a significant minor allele frequency • European: *2 - 10.7% *3 - 8.5 % • Asian: *2 - 0% *3 - 1-2% • African-American: *2 - 2.9% *3 - 0.8%
TIME TO STABLE ANTICOAGULATION CYP2C9-WT ~90 days CYP2C9-Variant ~180 days *2 or *3 carriers take longer to reach stable anticoagulation N 127 28 4 18 3 5 Effect of CYP2C9 Genotype on Anticoagulation-Related Outcomes (Higashi et al., JAMA 2002) WARFARIN MAINTENANCE DOSE mg warfarin/day - Variant alleles have significant clinical impact - Still large variability in warfarin dose (15-fold) in *1/*1 “controls”?
Analysis of Independent Predictors of Warfarin Dose Adapted from Gage et al., Thromb Haemost, 2004 Variable Change in Warfarin Dose P value Target INR, per 0.5 increase 21% <0.0005 BMI, per SD 14% <0.0001 Ethnicity(African-American, [Asian])13%, [ 10-15%] 0.003 Age, per decade 13% <0.0001 Gender, Female 12% <0.0001 Drugs (Amiodarone) 24% 0.007 CYP2C9*2, per allele19% <0.0001 CYP2C9*3, per allele30% <0.0001 ~ 30% of the variability in warfarin dose is explained by these factors What other candidate genes are influencing warfarin dosing?
Warfarin Epoxide Reductase -Carboxylase (GGCX) Warfarin acts as a vitamin K antagonist Pharmacodynamic CYP2C9 Inactivation Vitamin K-dependent clotting factors (FII, FVII, FIX, FX, Protein C/S/Z)
Epoxide Reductase (VKORC1) New Target Protein for Warfarin -Carboxylase (GGCX) Clotting Factors (FII, FVII, FIX, FX, Protein C/S/Z) Rost et al. & Li, et al., Nature (2004) 5 kb - chr 16
Warfarin Resistance VKORC1 Polymorphisms Rost, et. al. Nature (2004) • Rare non-synonymous mutations in VKORC1 causative for warfarin resistance (15-35 mg/d) • NOnon-synonymous mutations found in ‘control’ chromosomes (n = ~400)
0.5 5 15 Inter-Individual Variability in Warfarin Dose: Genetic Liabilities SENSITIVITY CYP2C9 coding SNPs - *3/*3 RESISTANCE VKORC1 nonsynonymous coding SNPs Frequency Common VKORC1 non-coding SNPs? Warfarin maintenance dose (mg/day)
SNP Discovery: Resequencing VKORC1 • PCR amplicons --> Resequencing of the complete genomic region • 5 Kb upstream and each of the 3 exons and intronic segments; ~11 Kb • SeattleSNPs PGA - pga.gs.washington.edu (24 African-Am./23 Europeans) • Warfarin treated clinical patients (UWMC): 186 European • Other populations: 96 European, 96 African-Am., 120 Asian
SNP Discovery: Resequencing Results Summary of PGA samples (European, n = 23) Total: 13 SNPs identified 10 common/3 rare (<5% MAF) Clinical Samples (European patients n = 186) Total: 28 SNPs identified 10 common/18 rare (<5% MAF) 15 - intronic/regulatory 7 - promoter SNPs 2 - 3’ UTR SNPs 3 - synonymous SNPs 1 - nonsynonymous - single heterozygous indiv. - highest warfarin dose = 15.5 mg/d How does the comprehensive SNP discovery compare to what was known for this gene?
SNP Discovery: dbSNP database dbSNP -NCBI SNP database
SNP Discovery: dbSNP database • SeattleSNPs Resequencing • 28 SNPs --> 15 SNPs gene region • 10 dbSNPs • 8/10 confirmations • 3 frequency/genotype data • 7 new dbSNP entries generated • by SeattleSNPs resequencing • 8 dbSNPs/15 SNPs (~50%)
Nickerson and Kruglyak, Nature Genetics, 2001 SNP Discovery: dbSNP database Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs (validated)!
1.0 96 48 24 16 8 Fraction of SNPs Discovered 0.5 2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Minor Allele Frequency (MAF) SNP discovery is dependent on your sample population size { GTTACGCCAATACAGGATCCAGGAGATTACC GTTACGCCAATACAGCATCCAGGAGATTACC 2 chromosomes
SeattleSNPs 25% { 75% Minor Allele Freq. (MAF) SNP Discovery: dbSNP database dbSNP (Perlegen/HapMap) 50% Minor Allele Freq. (MAF) Rarer and population specific SNPs are found by resequencing
dbSNP: Increasing numbers of SNPs now have genotype data HapMap Phase II Perlegen Perlegen Data
Current State of dbSNP Many SNPs left to validate and characterize.
Development of a genome-wide SNP map: How many SNPs? Nickerson and Kruglyak, Nature Genetics, 2001 ~ 10 million common SNPs (>1- 5% MAF) - 1/300 bp Mar 2005 - 5.0 million (validated - 1/600 bp) 5.0/10.0 = 50% of all common SNPs validated! Coming Soon! 5.0 million validated SNPs with genotypes!
SNP Discovery: dbSNP database dbSNP Issues: Not comprehensive catalog (50% of SNPs) Is the data confirmed? (50% are validated) Information about allele frequency/population (50%) No information about SNP correlations (linkage disequilibrium) genotyping efficiency
Frequency Warfarin Dose (mg/d) SNP Selection: Using Linkage Disequilibrium • Common SNPs • VKORC1 - 28 total - 10 SNPs > 10% MAF • Evaluate linkage disequilibrium (non-random association of genotype data) Does common variation in VKORC1 have a role in determining warfarin dose?
SNP Selection: Using Linkage Disequilibrium Site 2 Site 2 Site 1 Site 1 Maternal C A C : 50% A : 50% T G T : 50% G : 50% Paternal Possible 2-site comb. Expected Freq. Observed Freq. C A 0.5 X 0.5 = 0.25 0.50 * C G 0.5 X 0.5 = 0.25 0.01 T A 0.5 X 0.5 = 0.25 0.01 T G 0.5 X 0.5 = 0.25 0.48 * * Sites Correlated
SNP Selection: Using Linkage Disequilibrium • SNP discovery data (i.e. population of samples with genotypes) • Find all correlated SNPs to minimize the total number of SNPs • Maintains genetic information (correlations) for that locus LD_Select - SNP tagging/binning algorithm - based on LD (r2), not haplotypes Carlson, et al. AJHG (2004)
SNP Selection: VG/LD_Select on the Web pga.gs.washington.ed/VG2
e.g. Bin 1 - SNP 381 C/C C/T T/T SNP Testing: VKORC1 tagSNPs Five Bins to Test 381, 3673, 6484, 6853, 7566 2653, 6009 861 5808 9041 Bin 1 - p < 0.001 Bin 2 - p < 0.02 Bin 3 - p < 0.01 Bin 4 - p < 0.001 Bin 5 - p < 0.001 SNP x SNP interactions - haplotype analysis?
VKORC1 Summary: SNP Discovery/SNP Selection • VKORC1 candidate gene for warfarin dose response • SNP discovery performed using PCR/resequencing to catalog common SNPs • 28 SNPs found • 10 common SNPs • SNP discovery using dbSNP • 8/10 dbSNPs confirmed • 7 new SNPs added • SNP Selection using linkage disequilibrium • 10 common SNPs (> 10% MAF) • 5 informative SNPs for genotyping
Haplotypes Pick tagSNPs Genotype samples Pick tagSNPs Infer haplotypes Test for association Haplotypes in Genetic Association Studies Two main approaches with haplotypes:
Haplotypes in Genetic Association Studies How can you get haplotypes? What information do you get from haplotypes? How do you use haplotypes to find tagSNPs? How do you use haplotypes to test for associations?
Haplotypes – The Definition “…a unique combination of genetic markers present in a chromosome.” pg 57 in Hartl & Clark, 1997
Collect pedigrees Somatic cell hybrids Rodent Human C/C, A/G C/T, A/A Hybrid TT GG CC AG T/T, G/G C/C, A/G Allele-specific PCR SNP 1 SNP 2 CT AG C/T A/G C/T, A/G Constructing Haplotypes
Constructing Haplotypes Examples of Haplotype Inference Software: EM Algorithm Haploview http://www.broad.mit.edu/mpg/haploview/index.php Arlequin http://lgb.unige.ch/arlequin/ PHASE v2.1 http://www.stat.washington.edu/stephens/software.html HAPLOTYPER http://www.people.fas.harvard.edu/~junliu/Haplo/docMain.htm
Haplotypes in SeattleSNPs • >200 genes re-sequenced in inflammation response • 2 populations: European- and African-Americans • PHASEv2.0 results posted on website • Interactive tool (VH1) to visualize and sort haplotypes http://pga.gs.washington.edu