Eleazar Eskin UCLA

Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA

Motivation • Whole genome association study • How to perform multiple hypothesis correction • To increase statistical power • Incorporate prior information on molecular function of associated loci • Information on linkage disequilibrium structure

Main idea • Traditional method • Use a single significance threshold • In practice, markers are not identical • Set a different threshold at each marker, which reflects both intrinsic (e.g. LD, allele freq.) and extrinsic information on the markers

Standard Association Study • M markers in N cases and N controls • fi = minor allele frequency at marker i • True case/control allele frequency • Marker d: casual variant with a relative risk

Standard Association Study • Test statistic ~ N( ,1) • Power at a single marker (probability of detecting an association with N individuals at p-value or significance threshold t

Multiple Hypothesis correction • Fix the false positive rate at each marker so that the total false positive rate is α • Bonferroni correction • ti= α/M • Expected power: where ciis the probability of marker i to be causal  Probability of rejecting the correct null hypothesis

Multi-Threshold Association • Allow a different threshold ti for each marker • Power: with adjusted false positive rate • Goal: set values for ti to maximize the power subject to the constraints

Maximizing the Power • Gradient at each marker will be equal at the optimal point • Given a value of gradient, solve for the threshold at each marker to achieve that gradient • Do binary search over the gradient until thresholds sum to α

Maximizing Power for Proxies • In practice, markers are tags for causal variation • Given K variants, assign each potential causal variation vk to the best marker i • The effective non-centrality parameter is reduced by a factor of |rki| where rki is the correlation coefficient between variant k and marker i. • If vk is causal, the power function when observing proxy marker i is

Maximizing Power for Proxies • Each variant k has a prob of being causal ck • The total power captured by each marker i • The total power of the association study

Candidate Gene study • 1000 cases and controls over ENCODE regions using markers in Affymetrix 500k genechip

Robustness over relative risks

Whole Genome Association • Assumption • Each SNP is equally likely to be causal with relative risk of 2 • Power for traditional study and multi-threshold association for 2,614,057 SNPs • avg: 0.593 / 0.610 • Avg over power in [0.1, 0.9]: 0.568 / 0.615

Impact of extrinsic information • cSNPs are more likely to be involved in disease • Add information on se of genes which are more likely to be involved in specific disease • 30,700 cSNPs in HapMap contributes to 20% of the disease causing variation • Cancer Gene Census: 363 genes in which mutations have been implicated in cancer. 20% of causal variation is assumed in these genes

Eleazar Eskin UCLA

Eleazar Eskin UCLA

Presentation Transcript

UCLA Library

UCLA NS 172

UCLA NPI

UCLA

UCLA

By: Isabel, Eleazar, Giuseppina, and Alexxa

Maryland By Eleazar

Pascaline Dupas , UCLA

UCPath@UCLA Update

Eun Yong Kang , Ilya shpitser , Hyun Min Kang, Chun Ye, Eleazar Eskin

iGEM UCLA

WiMax Status @ UCLA

ELEAZAR J PEREZ

UCLA

UCLA BATTLEBOTS

UCLA

ucla guided meditation

Vancouver Notary Public | Nikolay Eskin Notary Public Vancouver

Gang Wang (UCLA)

Thomas R. Belin, UCLA Susan E. Stockdale, UCLA Lingqi Tang, UCLA