• 510 likes • 632 Views
Exploring complex diseases using genome-wide association: challenges and strategies. Li Jin, Ph.D. Fudan University CAS-MPG Partner Institute for Computational Biology. HGM2006, Helsinki. A G C. G G C. Gly. Ser. Positional Cloning. HGM2006, Helsinki. Linkage Disequilibrium.
E N D
Exploring complex diseases using genome-wide association: challenges and strategies Li Jin, Ph.D. Fudan University CAS-MPG Partner Institute for Computational Biology HGM2006, Helsinki
A G C G G C Gly Ser Positional Cloning HGM2006, Helsinki
Linkage Disequilibrium Linkage HGM2006, Helsinki
Daly et al. Nature Genetics, 2001 HGM2006, Helsinki
Genome-wide Association Study Candidate Gene/Region Association Study Select tagSNPs Genotyping tagSNPs Association analysis HGM2006, Helsinki
Challenges • Adjustment for multiple testing and power • Portability of tagging SNPs between populations • Population stratification • Mapping the mutation • Exploring gene-gene interaction HGM2006, Helsinki
Challenges • Adjustment for multiple testing and power • Portability of tagging SNPs between populations • Population stratification • Mapping the mutation • Exploring gene-gene interaction HGM2006, Helsinki
Multiple Testing • Large number of SNPs • Number of tagging SNPs remains to be large (106) • Multiple testing problem: • Stringent p-value (10-6– 10-7) • Freimer and Sabatti (2004) • Sample size and power • Association: • Linear transformation: T is an invariable • Nonlinear transformation HGM2006, Helsinki
Motivation Statistics based on Statistics based on Low Power Higher Power? HGM2006, Helsinki
Nonlinear Transformations Function Derivative Entropy Exponential Polynomial Sigmoid Gaussian Reciprocal HGM2006, Helsinki
Power (Case-Control ) Expected noncentrality parameters of the nonlinear test statistics NA=NG=100, PD=0.5 HGM2006, Helsinki
Association Studies Association test of MMP-2 gene with esophageal carcinoma P values entropy exponential polynomial sigmoid reciprocal χ2 3.2 ×10-8 2.3 ×10-7 1.9 ×10-7 2.0 ×10-7 5.1 ×10-6 7.0 ×10-6 Yu C, et al. Cancer Res 2004, 64: 7622-7628 HGM2006, Helsinki
Challenges • Adjustment for multiple testing and power • Portability of tagging SNPs between populations • Population stratification • Mapping the mutation • Exploring gene-gene interaction HGM2006, Helsinki
Pop A Pop B How LD patterns are compared between populations? • Step 1: Infer haplotype blocks for each population • Step 2: Compare the boundaries of LD blocks between populations. Target SNP HGM2006, Helsinki
Factors Influencing Block Inferences • Sample size • Criterion and thresholds • Genotyping error • Gene flow • Search algorithm HGM2006, Helsinki
Af As Eu ? Daic (Thai) HGM2006, Helsinki
European 40 Uighur 45 Hmong 46 Han 50 African American 48 Wa 45 Zhuang 44 Samoan 50 Samples HGM2006, Helsinki
SNP Selection and Genotyping • Selected from dbSNP (build 117) • Most of them are double-hits • 26,112 SNPs on Chro. 21 • 1 SNP for every 1.3 kb (Golden Path b.34) • Illumina BeadLab platform • 17 oligonucleotide primer sets • Three QA criteria • Samples • SNP: trios & duplicates • SNP: Hardy-Weinberg Expectation HGM2006, Helsinki
Zhuang Han Wa Hmong African American Samoan Uighur European HGM2006, Helsinki
Phylogeny of Human Populations Hmong Genetic Distance (FST) Zhuang Han Wa Uyghur European African HGM2006, Helsinki
Pop A Pop B SAB = c/a SBA = c/b Measurement of LD Sharing • SNPs presented in both Pop A & Pop B • SNPs with MAF 0.1 were included • In LD, if r2 c (c = 0.1, 0.5, 0.8) 200kb Target SNP a = # LD in A c = # LD in A & B b = # LD in B HGM2006, Helsinki
SAB ~ FST In non-Africans FST increases with time after divergence (t) HGM2006, Helsinki
Pop A Pop B SAB = c/a SBA = c/b 200kb Target SNP a = # LD in A c = # LD in A & B b = # LD in B Correlation of LD between Populations = corr(a,b) HGM2006, Helsinki
Correlation of LD Between Populations and Genetic Distance (FST) HGM2006, Helsinki
Number of SNPs captured by tagSNPs RAB = Total number of SNPs Portability of tagging SNPs (RAB) Pop A Pop B Portability from A to B = RAB HGM2006, Helsinki
R can be estimated using FST • FST can be estimated using a small number of SNPs • Conclusion: R can be approximately estimated by • typing a small number of SNPs 1- RAB ~ FST HGM2006, Helsinki
t RAB FST HGM2006, Helsinki
Conclusions • Substantial LD sharing between populations: ancestral LDs • tagSNPs are generally portable between populations, at least within Asia • Portability of a population to another can be estimated empirically using a small set of SNPs HGM2006, Helsinki
Challenges • Adjustment for multiple testing and power • Portability of tagging SNPs between populations • Population stratification • Mapping the mutation • Exploring gene-gene interaction HGM2006, Helsinki
Population Stratification • 209 languages belonging to 6 linguistic families • Consistent observation of south-north differentiation • Affect the power of association studies - false positives • Different loci show different level of differentiation: Is there an adequate adjustment? HGM2006, Helsinki
Individual tree Chromosome 21 20,288 SNPs HGM2006, Helsinki
Cluster Decomposition of Chinese Populations HGM2006, Helsinki
Geographic Genetic Clines Based on Principle Components Y Chromosomes 143 populations mtDNA 91 populations CODIS STRs 79 populations HLA-A 107 populations HGM2006, Helsinki
Distributions of mtDNA Haplogroups HGM2006, Helsinki
Distributions of Y Haplogroups HGM2006, Helsinki
All haplogroups Major haplogroups All haplogroups HGM2006, Helsinki
Uyghurs HGM2006, Helsinki
Uyghurs HGM2006, Helsinki
Population Stratification • Different loci show different level of differentiation • Admixture indeed exist at least in some of the populations • Adjustment for population stratification using average differentiation is not adequate HGM2006, Helsinki
Challenges • Adjustment for multiple testing and power • Portability of tagging SNPs between populations • Population stratification • Mapping the mutation • Exploring gene-gene interaction HGM2006, Helsinki
Perfect Phylogeny Approach • No recombination and recurrent mutation • No loop in network • Not necessarily continuous • Objective: Group SNPs into PP sets PP(A) PP(B) PP(C) HGM2006, Helsinki
1 2 4 3 1 2 3 4 5 site 1 (1, 2, 3) (4, 5) site 2 (2 , 3) (1, 4, 5) site 3 (1, 2, 3, 5) (4) site 4 (2) (1, 3, 4, 5) Inference of Phylogeny HGM2006, Helsinki
Comparison of Different Algorithms HGM2006, Helsinki
1 2 4 3 1 2 3 4 5 site 1 (1, 2, 3) (4, 5) site 2 (2 , 3) (1, 4, 5) site 3 (1, 2, 3, 5) (4) site 4 (2) (1, 3, 4, 5) Inference of Phylogeny HGM2006, Helsinki
Identification of Disease Mutation • For each PP, it allows a stepwise search to localize the most likely branch (edge) of the mutation. • The best PP can be determined based on the likelihood (with adjustment of degree of freedom) PP(A) PP(B) PP(C) HGM2006, Helsinki
Challenges • Adjustment for multiple testing and power • Portability of tagging SNPs between populations • Population stratification • Mapping the mutation • Exploring gene-gene interaction HGM2006, Helsinki
A Study of CAD • Coronary Atherosclerosis in Chinese Populations • 123 candidate genes belong to several pathways including antioxidant, inflammation, coagulation • 1,518 tagSNPs typed • 916 samples (492 cases and 424 controls) HGM2006, Helsinki
PON2 GPX3 CD36 MMP8 PON1 SOD2 ACE PON3 DSCR1 ITGA2 PDGFC TXN TGFB3 MSR1 ITGB1 PDGFB SELL CCR2 ITGA6 NFKB1 VEGF NPR3 LAMA4 IL1B MMP9 EDN1 SELE Anti-oxidation Pathway TXN GCLM GSR HMOX1 Inflammatory Pathway MMP9 With-PW interaction GSS NOS3 Between-PW interaction HGM2006, Helsinki