260 likes | 406 Views
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. The Wellcome Trust Case Control Consortium, Nature , 2007 Presented by Group 4: Jessica Larson, Irene Shui, and Lucia Sobrin. Outline. Introduction Methods Case-control structure
E N D
Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls The Wellcome Trust Case Control Consortium, Nature, 2007 Presented by Group 4: Jessica Larson, Irene Shui, and Lucia Sobrin
Outline • Introduction Methods • Case-control structure • Population stratification • Data analysis • Results • Diabetes (Type II) • Crohn’s disease • Rheumatoid arthritis • Coronary artery disease • Discussion and Conclusion
Introduction • Several common (complex) diseases with evidence for heritability but incomplete knowledge of causal genes • Genome-wide association studies (GWAS) would help ‘unlock’ the genetic basis for common diseases • Requires large sample sizes (for sufficient power) • HapMap resource • This study validates the GWA method
Introduction, continued • WTCCC combines 50 research groups throughout the UK • Large selection of cases and controls • Seven common diseases: • Type II diabetes (T2D) • Crohn’s disease (CD) • Coronary artery disease (CAD) • Rheumatoid arthritis (RA) • Type I diabetes (T1D) • Hypertension (HT) • Bipolar disorder (BD) • Multiple diseases studied so that WTCCC could look at differences between the diseases themselves (not just between cases and controls for each disease)
Cases and Controls • 2,000 cases for each disease • 3,000 shared controls • 1500 from 1958 British Birth Cohort • 1500 from UK Blood Services
Purpose: To assess possible bias in ascertaining control samples Two Control Groups
Shared Controls • Potential Issues • Misclassification bias • Inflation of type 1 error rate from failure to match on socio-demographic variables • This study provides compelling case for the suitability and efficiency of this design in Britain.
Population Stratification • Only included self-identified white Europeans • Further excluded 153 individuals with evidence of recent non-European ancestry. • Still possible heterogeneity; waves of immigration • Analyzed allele frequency differences in 12 geographic regions • 13 genomic regions with strong geographic variation (NW/SE axis; London set apart) • Geographic correlation not apparent in 7 diseases studied • Principal components analysis • Conclude that population stratification not much of a problem once individuals with non-European ancestry excluded • Adjusting for principal components and stratifying by geographic region did not make a big difference in overdispersion; p-values with and without structure correction were similar
Previously implicated in Europeans LCT 4p14 HLA 11 df test for differences in allele frequency between geographic regions NADSYN1 (11q13—possible role in prevention of pellagra) TLR1 (4p14 toll-like receptor 1—possible role in biology of TB and leprosy) LCT (Iactase digestion) HLA (Major histocompatibility complex) Figure 2, Wellcome Trust Case Control Consortium, 2007
SNP genotyping and Data Analysis • GeneChip 500K Affymetrix arrays • Gene-calling algorithm CHIAMO • For polymorphic SNPs • Trend tests • General genotype tests between cases and controls • Sex-differentiation test • Loci affecting more than one disease, combine the cases vs. the controls • CAD+HT+T2D (metabolic overlap) • RA+T1D (known to share common loci) • CD+RA+T1D (autoimmune diseases)
Data Analysis • Significance levels were chosen not to directly correct for multiple tests (to obtain a ‘genome-wide significance level’), but to still have a low FDR • Strong: regions with at least one SNP’s P-val<5x10^-7 (Table 3) • Single disease: 21 signals • Sex diff: RA • Combined cases: RA+T1D • 25 total • 12 of which previously described • Rest have been confirmed, except one • Moderate: 5x10^-7< P-val< 1 x10^-5 (Table 4) • Nominal: 1x10^-5< P-val< 1 x10^-4 (Supplementary Table 7)
Notes on interpretation of this data • Replication needed • Failure to detect an association does not mean that a given gene is unassociated with disease • Help define regions of interest, cannot clearly identify causal genes
Overall Results Figure 4, Wellcome Trust Case Control Consortium, 2007
Type 2 Diabetes • Detected all three previously widely replicated associations • TCF7L2 • SNP with strongest etiological claims not on Affy chip, but imputation analysis confirms it is the SNP with strongest association effect • PPARG and KCNJ11 (p~0.001 for both) • Genuine disease susceptibility genes can generate signals in GWS that would not attract immediate attention
Type 2 Diabetes • Compared to French GWAS Findings • Confirms finding on Chromosome 10 • Three other findings cannot be replicated • One SNP is poorly covered by Affy chip and extensive recombination in region limits data imputation • Two other SNPs cannot be confirmed by either genotyped or imputed SNPs from the WTCCC
Crohn’s Disease • Common form of chronic inflammatory bowel disease • Pathogenesis poorly understood • Dysregulated immune response to intestinal bacterial and possibly defects in mucosal barrier function or bacterial clearance • Genetic predisposition is strong (lambda-s 17-35; twin studies: 50% concordance in monozygotic vs 10% in dizygotic twins)
Crohn’s Disease GWAS Results • Previously defined susceptibility loci (6) all replicated • Four new strong association signals (p-value <5X10-7) • Successfully replicated in other studies • Eight less strong evidence for association markers (p-value >5X10-7 and <1X10-5) • Several with biological candidacy • Majority of associations modest RR<2 • Functional mechanism: autophagy • Newly identified susceptibility gene (IRGM) proposed to control the spread of intracellular pathogens by autophagy (ATG16L1 also involved in autophagy) • Possible functional mechanism of autophagy and Crohn’s Disease supported by molecular genetic studies
5q31 CARD15 ATG16L1 IL23R 5q13.1 IRGM 10q21 BSN/MST1 NKX2-3 PTPN2 Crohn’s Disease Strong Associations WTCCC SNP in LD with SNP T300A RED—Replicated defined markers/possible genes GREEN-Novel Markers/possible genes Figure 4, Wellcome Trust Case Control Consortium, 2007
Coronary artery disease (CAD) • Plaque buildup in arteries • Environmental (diet) and genetic factors • Previously associated genes not replicated here (APOE, p-val:1.7x 10^-1) • Found a new region of interest 9p21.3 (1.8x10^-14) and several moderate associations
Rheumatoid arthritis (RA) • Chronic inflammatory disease, destruction of joints, severe disability • Again, environmental and genetic factors • Previously associated genes replicated here (HLA-DRB1, p-vals: 10^-27; PTPN22, p-vals: 10^-25) • Found two new regions of interest and several moderate associations • Most interesting is the sex effect (p-val: 3.9 x 10^-7), additive effect in females only
Common Loci for Autoimmune Diseases • CD25 region • Encodes IL-2 receptor • Association with both RA and T1D (p~10-8 and p~10-6, respectively) • PTPN2 • Encodes a key negative regulator of inflammatory responses • Strong association with CD and T1D (p~10-8) and weaker but consistent association with RA (p~10-2)
Discussion/Conclusions • GWAS yielded multiple association findings for multiple diseases, many of them novel • Large study; still power issues for OR<1.2 • Extensive quality control • Used both linear trend and 2 df genotypic test • Replication is key • “winner’s curse”; ORs will tend to be overestimated for loci discovered • Several studies have replicated; more work needs to be done • Incomplete coverage of Affy chip for some SNPs (T1D INS) • Functional studies needed to make inferences about molecular and physiological mechanisms involved and causal variants • No real gene-gene/gene-environment interactions tested • Findings to date only explain a small proportion of the genetic variation in these diseases • Information is publicly available! • http://www.wtccc.org.uk/info/access_to_data_samples.shtml