Adjusting Relatedness for Family Data in Collapsing Test of Rare Variants

Adjusting Relatedness for Family Data in Collapsing Test of Rare Variants Qunyuan Zhang, Doyoung Chung Ingrid Borecki, Michael A. Province Division of Statistical Genomics Washington University School of Medicine St. Louis, Missouri, USA IGES, Sept. 2011, Heidelberg Contact: Qunyuan Zhang, qunyuan@wustl.edu

Introduction Advances of sequencing technologies have been facilitating rare variants (RVs) identification. Family data, as potentially enriched with RVs within pedigrees, may provide a great source for detecting association between RVs and human complex traits. Most RV testing methods developed in recent years, however, are data-driven and permutation-based collapsing methods, which are inapplicable to family data, because direct permutation test ignores and destroys family structure.

Purpose To deal with the relatedness issue in family data , we propose a mixed model based procedure that incorporates family information with collapsing analysis in a permutation test, denoted by MMPT (Mixed Model-based Permutation Test).

Statistical Model (1) Y is the observed trait , α the intercept, βthe collective effect coefficient, m the number of RVs in a genetic unit (usually a gene) of interest, wi the weight of variant i, gi the number (0, 1 or 2) of minor allele of variant i, εthe residual. The Σwigipart in the model is the weighted sum score of multiple variants. Z is the design matrix corresponding to γ, andγ follows a multivariate normal distribution of N(0,G). Here G is the variance-covariance matrix of γ, which can be decomposed as G=2σ2K, where K is the kinship matrix and σ2 is the additive ploygene genetic variance. To deal with family structure, we generalize collapsing test as a weighted sum score test based on a linear mixed model:

Weighted Sum Scores In terms of weighting, most existing collapsing methods can be viewed as special instances of model (1). For example, Morgenthaler and Thilly’s CAST is equivalent to setting wi=1 for all RVs; Li and Leal’s CMC sets wi=1 for all RVs but limits the sum ≤1. Madsen and Browning’s WSS calculates wi based-on allele frequency in controls. Han and Pan’s aSum test recodes genotypes (equivalent to choosing wi = 1 or -1) according to a pre-defined cutoff of p-value; Zhang et al’s PWST and SPWST define wi as a rescaled left-tailed p-value.

MMPT: Mixed Model-based Permutation Test Permuted Since WSS, aSum, PWST and SPWST are data-driven and permutation-based test, we apply model (1) to them by permuting the weighted sum score part and fixing the subject IDs of the rest of components, illustrated as below: Non-permuted, subject IDs fixed

Data The 200 replications of data of 697 subjects from 8 extended families simulated by the Genetic Analysis Workshop (GAW) 17 [Almasy et al., 2011] were used, and the quantitative trait Q2 was chosen as the target trait. For each gene, the genotypes with minor allele frequency (MAF) less than 0.01 were collapsed into a variable using different weighting methods (CMC, WSS, aSum, PWST and SPWST) . The kinship matrix K was calculated based on the pedigree data. The Genetic Analysis Workshop (GAW) 17 is supported by the NIH Grant R01 GM031575. Preparation of the GAW 17 simulated data set was supported in part by NIH R01 MH059490 and used sequencing data from the 1000 Genomes Project (www.1000genomes.org)

Results(1) Q-Q Plots of –log10(P) under the Null CMC non-permutation test, ignoring family structure, inflation of type-1 error CMC non-permutation test, modeling family structure via mixed model, inflation is corrected

WSS Permutation test, ignoring family structure, inflation of type-1 error Results(2) Q-Q Plots under the Null aSum PWST SPWST

WSS Mixed model-based permutation test (MMPT), modeling family structure, inflation corrected Results(3) Q-Q Plots under the Null aSum PWST SPWST

Conclusions Ignoring relatedness between subjects in family data may result in significant inflation of type-1 error in collapsing test of rare variants. Directly modeling kinship data using mixed model can correct the inflation of non-data-driven collapsing test (e.g. CMC). Directly applying data-driven and permutation-based methods (e.g. WSS, aSum, PWST and SPWST) to family data may result in significant inflation of type-1 error, too. The inflation of data-driven and permutation-based methods can be corrected by the proposed MMPT method, which incorporates kinship information with permutation test.

Main References Almasy LA, Dyer TD, Peralta JM, Kent JW Jr, Charlesworth JC, Curran JE, Blangero J.: Genetic Analysis Workshop 17 mini-exome simulation. BMC Proc 2011, 5 (suppl 8): Han F, Pan W. 2010. A data-adaptive sum test for disease association with multiple common or rare variants. Hum Hered 70(1):42-54. Li B, Leal SM. 2008. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83(3):311-21. Madsen BE, Browning SR. 2009. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5(2):e1000384. Morgenthaler S, Thilly WG. 2007. A strategy to discover genes that carry multi-allelic or mono-allelic risk for common diseases: a cohort allelic sums test (CAST). Mutat Res 615(1-2):28-56. Zhang Q, Irvin MR, Arnett DK, Province MA, Borecki I. Genet Epidemiol. 2011, doi: 10.1002/gepi.20618

Adjusting Relatedness for Family Data in Collapsing Test of Rare Variants

Adjusting Relatedness for Family Data in Collapsing Test of Rare Variants

Presentation Transcript

Relatedness

Syntactic relatedness

Collapsing Can

Independence Fault Collapsing and Concurrent Test Generation

Self-Adjusting Data Structures

Adjusting RAINFALL DATA FOR CLIMATE Change

Collapsing Gracelessly

Association Tests for Rare Variants Using Sequence Data

Analysis of imputed rare variants

Self-Adjusting Data Structures

Identifying Rare Variants with Bidirectional Effects on Quantitative Traits

Rare and common variants: twenty arguments G.Gibson

Human genetic variation: Recombination, rare variants and selection

Statistical Methods for Rare Variant Association Test Using Summarized Data

Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants

Variants of parsimony

Association Analysis of Rare Genetic Variants

Adjusting to Family Changes

A GLMM-based Collapsing Method for Rare CNV Analysis

Variants in Judaism

Adjusting for Family Composition and Size

Independence Fault Collapsing and Concurrent Test Generation