1 / 11

Genome-wide association studies

C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica Su, Nan Laird and Christoph Lange Harvard School of Public Health. Genome-wide association studies.

meira
Download Presentation

Genome-wide association studies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control designMatt McQueen, Jessica Su, Nan Laird and Christoph LangeHarvard School of Public Health

  2. Genome-wide association studies Limitation of linkage analysis and the potential of association analysis => genome-wide association studies (Risch & Merikangas 1997) 100,000 > SNPs and phenotypes are tested for association. Statistical road block: Severe multiple testing problem!!!

  3. “Using the same data set for screening and testing” Screening technique S Testing statistic T • Testing strategy: • Assess evidence for association for all SNPs based on S (Screening Step) • Select a small subset of N markers (10-200) • Compute the association test conditional upon S and adjust N comparisons (Testing Step) • If the screening step and the testing step are statistically independent, we can look at the data in the screening step without paying a “statistical price” for it.

  4. “Using the same data set for screening and testing” General concept proposed by Laird and Lange (2006, Nat Rev Genet) Decomposition of joint-likelihood: P( {phenotype, genotype} ) = P( {phenotype, genotype} | S({phenotype, genotype}) ) * P(S{phenotype, genotype}) • S = “Summary test statistic to assess evidence for association” • Requirements for S: • The association test has to condition on S • S has to contain information about the potential association as well = Testing Step = Screening Step • Testing strategy: • Assess evidence for association for all SNPs based on S (Screening Step) • Select a small subset of N markers (10-200) • Compute the association test conditional upon S and adjust N comparisons (Testing Step) • The screening step and the testing step are statistically independent !!!

  5. “Using the same data set for screening and testing” Application to family-based association tests (VanSteen et al (2005)) Decomposition of joint-likelihood: P( {phenotype, genotype, parent genotype} ) = P( {phenotype, genotype} | {phenotype, par. genotype} ) * P({phenotype, par genotype}) • S = “phenotype and parental genotype/sufficient statistic” = Screening Step based on conditional mean model Lange et al (2003) = Between-family component Fulker et al (1999) = Within-family component (Fulker et al (1999)) = Testing Step based FBAT Laird et al (2000) • Alternative approach: • Instead of using the between-component (Screening step) and the within-component (Testing Step) in 2 stage testing strategy one could include both components in the test statistics, e.g. QTDT (Abecasis et al (2000)) • Disadvantages: • Only marginal power gains (5%) over the FBAT-statistic when a single SNP is tested (Abecasis et al (2001)) • Lack of robustness against population admixture (Yu et al (2006)) • Properties of the testing strategy: • Outperforms standard adjustments for multiple comparions by factors up to 40 • Additional power boost by the use of complex phenotypes such as longitudinal data: Discovery of INSIG2 in a 100K-scan in the Framingham Heart Study First replicable association for BMI / obesity (Herbert et al (2006, Science))

  6. “Using the same data set for screening and testing” Can we translate this concept to association studies in unrelated cases and controls? c2-Tests and Amitrage-trend tests are conditional tests that condition upon the margins => The data-partitioning statistic S are margins of the table

  7. Testing strategy: • 1.) Divide table into a “screening table” and a “testing table“ • 2.) For each SNP, use the “screening table” and the margins of the “testing table” to assess evidence for association in the screening step • 3.) Select the most promising N SNPs and test them for association based on the data of the testing table. • How can we obtain information about an association from the margins? = Screening Step = Testing Step

  8. + Results will depend on the actual random split-up of the tables! Solution: 1.) Re-sampling of the tables 2.) p-value for testing set based on p(data)=p(data|S(data))*p(S(data)) and Monte-Carlo simulations

  9. Simulation Study

  10. Can C2BAT find INSIG2 in the 100K-scan in Framingham Heart Study again ? • 1400 probands in about 300 families: • Randomly select 150 unrelated cases/controls (BMI>28 = “affected”) • =>Apply standard analysis (p-value adjusted by Bonferroni correction) and C2BAT to see whether INSIG2 reaches genome-wide significance For 1000, replicates: Power of standard analysis to detect INSIG2: 5% Power of C2BAT to detect INSIG2: 17%

  11. Future work: 1.) Extension to quantitative traits =>Expression analysis 2.) Gene-gene interactions Software: www.c2bat.com

More Related