150 likes | 268 Views
HapMap data for genome-wide disease association studies ~ cases from SNP Research Center, RIKEN ~. Toshihiro Tanaka SNP Research Center, RIKEN. Millennium SNP projects in Japan (April, 2000 – March, 2005). I. Infrastructure a) collection of gene-based SNPs
E N D
HapMap data for genome-wide disease association studies ~ cases from SNP Research Center, RIKEN ~ Toshihiro Tanaka SNP Research Center, RIKEN
Millennium SNP projects in Japan (April, 2000 – March, 2005) I. Infrastructure a) collection of gene-based SNPs 190,000 variations identified in two years b) high-throughput genotyping system low cost, semi-automated using Invader assay II. Application Identification of genes with medical importance Disease associated genes Genes defining drug sensitivity
Two-step genotyping strategy for genome-wide approach 1. genotype small number of samples (100 ~ 200) for a large set of SNPs (100,000 ~ 250,000) 2. set p-value threshold to take further steps (0.01) 3. loci that passed the threshold will be further examined by expanding the sample scale And, also candidate gene approach
SNP Research Center, RIKEN Laboratory for Cardiovascular Diseases lymphotoxin-a (Nature Genetics, 2002) galectin-2 (Nature, 2004) Laboratory for Rheumatic Diseases PADI4 (Nature Genetics, 2002) SLC22A4 (Nature Genetics, 2002) FCRL3 (Nature Genetics, 2005) Laboratory for Bone & Joint Diseases asporin (Nature Genetics, 2005) CILP (Nature Genetics, 2005) CALM1 (Hum Mol Genet, 2005) Laboratory for Diabetic Nephropathy SLC12A3 (Diabetes, 2003) WNT5B (Am J Hum Genet, 2004) Laboratory for Allergic Diseases CLCA1 (Genes and Immunity, 2004) DAP3 (J Hum Genet, 2004) IFNA (Hum Genet, 2004) ADAM33 (Clin Exp Allergy, 2004)
Purpose To know the practical usefulness of HapMap data for disease association studies Question: Could we have identified disease-associated loci/SNPs if we had used SNP data and software from HapMap HP to select SNPs to be genotypedin the first stage screening?
Question, in other words…. Imagine a researcher wishing to identify certain disease associated loci by GWA study, without knowing any previous association reports. He/she decided to select SNPs to be genotyped by using HapMap data and Haploview software. He/she examined 500 patients and 500 controls. He/she set the threshold p-value, 0.01. Could he/she detect loci that were previously reported by us? (even when the associated SNPs were hidden from HapMap data)
Study protocol Obtain genotyping data around the disease-associated loci from HapMap home page Select tag SNPs using Haploview software (block-by-block basis, and Tagger) * All the disease-associated SNPs were in the database. treated as untyped (hidden SNPs). * Default settings were used for Haploview in most conditions. Genotype selected tag SNPs and perform association analysis for ~500 case and ~500 control samples
LGALS2 locus (candidate gene approach) association result p=4.5x10-6 OR=1.23 n=~2,000 tagged SNPs block-by-block basis: 8,9,10 Tagger (r2>0.8): 9,10 Tagger (r2=1): 9,10,11,12
Association analyses (comparison of allele frequency) SNP8 SNP9 SNP10 SNP11 SNP12 P = 0.0023 OR = 1.35 r2 = 0.832 D’ = 0.956 P = 0.015 OR = 1.25 r2 = 0.587 D’ = 0.978 P = 0.0038 OR = 1.32 r2 = 0.867 D’ = 0.978 P = 0.0092 OR = 1.29 r2 = 0.863 D’ = 0.931 P = 0.0020 OR = 1.32 r2 = 0.616 D’ = 0.973 SNP14 (disease associated SNP, MAF=35.0%) P = 0.00036, OR = 1.41
LTA locus (HLA region, genome-wide approach) association result p=1.3x10-4 n=~1,000 association result p=3.3x10-6 n=~1,000 r2=0.866 D'=1
Association analysis SNP18 disease associated SNP MAF=34.1% SNP9 (MAF=32.5%) P = 0.0015, OR = 1.35 P = 0.00033, OR = 1.40 r2 = 0.90, D' = 0.99
Newly identified locus for one common disease (candidate gene approach) association result p=3.3x10-7 n=~3,000 100kb no haplotype block no related SNP
Sample scale and cut-off p value p value Minor Allele Frequency = 0.35 OR = 1.41 OR = 1.35 number of samples
Summary All disease-associated SNPs were in the database. = in part, good luck, in part, good quality of the database. If they are treated as untyped (hidden SNPs), we lose some of the disease-associated loci, depending on their haplotype structure. There is a need to examine certain number of samples and to set appropriate p-value threshold to detect them, which, naturally, should take cost of the study into account.