340 likes | 550 Views
Associations to Quantitative Trait Network and Analysis of Asthma Data. Seyoung Kim and Eric P. Xing {sssykim, epxing}@cs.cmu.edu Machine Learning Dept. Carnegie Mellon University. 10/30/2009. Genome Informatics 2009 @ Cold Sprint Harbor Lab. Association Analysis of Single Trait.
E N D
Associations to Quantitative Trait Network and Analysis of Asthma Data Seyoung Kim and Eric P. Xing {sssykim, epxing}@cs.cmu.edu Machine Learning Dept. Carnegie Mellon University 10/30/2009 Genome Informatics 2009 @ Cold Sprint Harbor Lab
Association Analysis of Single Trait a univariate phenotype: i.e., disease/control, gene expression level causal SNP
Association Analysis of Quantitative Trait Network a univariate phenotype: i.e., disease/control, gene expression level causal SNP
TCGACGTTTTACTGTACAATT Genetic Association for Asthma Clinical Traits Subnetworks for lung physiology Subnetwork for quality of life
TCGACGTTTTACTGTACAATT Expression QTL Mapping Microarray experiments Gene correlation network with gene modules
Motivation : Multiple-trait Association • Traditional approach: analyze one phenotype at a time • Our approach: consider multiple related phenotypes jointly and incorporate correlation structure in the phenotypes • Graph-guided fused lasso (Kim & Xing, PLoS Genetics, 2009)
Multivariate Regression for Single-Trait Association Analysis Allergy Symptom Genotype Association Strength x 2.1 = T G A A C C A T G A A G T A y x β X =
Multivariate Regression for Single-Trait Association Analysis Allergy Symptom Genotype Association Strength x 2.1 = T G A A C C A T G A A G T A argmin (y – Xβ) (y – Xβ) β Many non-zero associations: how to pick the threshold?
Lasso for Reducing False Positives (Tibshirani, 1996) Allergy Symptom Genotype Association Strength x 2.1 = T G A A C C A T G A A G T A Lasso Penalty for sparsity argmin (y – Xβ) (y – Xβ) λ | βj | + β Many zero associations (sparse results), but what if there are multiple related traits?
Multivariate Regression for Multiple-Trait Association Analysis Genotype Association Strength Allergy for roaches FEV FEF Allergy for cats Allergy in spring x (3.4, 1.5, 2.1, 0.9, 1.8) = Lung physiology Allergy T G A A C C A T G A A G T A ? argmin (y – Xβ) (y – Xβ) λ | βj | + β How to combine information across multiple traits to increase the power?
Multivariate Regression for Multiple-Trait Association Analysis Genotype Association Strength Allergy for roaches FEV FEF Allergy for cats Allergy in spring x (3.4, 1.5, 2.1, 0.9, 1.8) = Lung physiology Allergy T G A A C C A T G A A G T A argmin (y – Xβ) (y – Xβ) λ | βj | + β We introduce graph-guided fusion penalty +
Multivariate Regression for Multiple-Trait Association Analysis Genotype Association Strength Allergy for roaches FEV FEF Allergy for cats Allergy in spring x (3.4, 1.5, 2.1, 0.9, 1.8) = Lung physiology Allergy T G A A C C A T G A A G T A argmin (y – Xβ) (y – Xβ) λ | βj | + β +
Fusion Penalty • Fusion Penalty: | βjk - βjm | • If two traits are correlated (connected in the trait network), they are likely to share a similar association strength SNP j ACGTTTTACTGTACAATT Association strength between SNP jand Traitm:βjm Association strength between SNPjand Traitk:βjk Trait m Trait k
Graph-Constrained Fused Lasso • Fusion effect propagates to the entire network • Association between SNPs and subnetworks of traits Overall effect ACGTTTTACTGTACAATT
Graph-Weighted Fused Lasso • Subnetwork structure is embedded as a densely connected nodes with large edge weights • Edges with small weights are effectively ignored Overall effect ACGTTTTACTGTACAATT
Asthma Dataset • 543 severe asthma patients from the Severe Asthma Research Program (SARP) • Genotypes : 34 SNPs in IL-4R gene • 40kb region of chromosome 16 • Impute missing genotypes with PHASE (Li and Stephens, 2003) • Traits : 53 asthma-related clinical traits • Quality of Life: emotion, environment, activity, symptom • Family history: number of siblings with allergy, does the father has asthma? • Asthma symptoms: Chest tightness, wheeziness
Asthma Trait Network Trait Correlation Structure Trait Network Threshold at 0.7 Traits are reordered according to hierarchical clustering results
Asthma Trait Network Subnetwork for Asthma symptoms Phenotype Correlation Structure Subnetwork for lung physiology Subnetwork for quality of life
Results from Single-SNP/Trait Test • Lung physiology-related traits I • Baseline FEV1 predicted value: MPVLung • Pre FEF 25-75 predicted value • Average nitric oxide value: online • Body Mass Index • Postbronchodilation FEV1, liters: Spirometry • Baseline FEV1 % predicted: Spirometry • Baseline predrug FEV1, % predicted • Baseline predrug FEV1, % predicted Phenotypes Phenotypes • Q551R SNP • Codes for amino-acid changes in the intracellular signaling portion of the receptor • Exon 12 Trait Correlation Matrix Trait Network SNPs Single-Marker Single-Trait Test Permutation test α = 0.05 Permutation test α = 0.01
Comparison of Gflasso with Others • Lung physiology-related traits I • Baseline FEV1 predicted value: MPVLung • Pre FEF 25-75 predicted value • Average nitric oxide value: online • Body Mass Index • Postbronchodilation FEV1, liters: Spirometry • Baseline FEV1 % predicted: Spirometry • Baseline predrug FEV1, % predicted • Baseline predrug FEV1, % predicted Phenotypes Phenotypes • Q551R SNP • Codes for amino-acid changes in the intracellular signaling portion of the receptor • Exon 12 Trait Correlation Matrix Trait Network SNPs ? ? Single-Marker Single-Trait Test Graph-weighted Fused Lasso Graph-constrained Fused Lasso Lasso
Software for Genome-Phenome Association SNP File Gene module 1 Gene Expression File 10600000 10700000 Chromosome 12
Future Work: Correlated Genome-Transcriptome-Phenome Association Analysis • GFlasso • Tree lasso • Population lasso Phenome Structure Genome Structure Linkage Disequilibrium Three-way Association! Population Structure Clinical Traits Transcriptome Structure • Bi-clustering • GFlasso • Tree lasso • Population lasso Gene Modules
Thanks! • Software is available at http://sailing.cs.cmu.edu/gflasso • Acknowledgements: Ross Curtis, Kyung-Ah Sohn, Sally Wenzel Funding:
Reference • Tibshirani R (1996) Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, Series B 58:267–288. • Weller J, Wiggans G, Vanraden P, Ron M (1996) Application of a canonical transformation to detection of quantitative trait loci with the aid of genetic markers in a multi-trait experiment. Theoretical and Applied Genetics 92:998–1002. • Mangin B, Thoquet B, Grimsley N (1998) Pleiotropic QTL analysis. Biometrics 54:89–99. • Chen Y, Zhu J, Lum P, Yang X, Pinto S, et al. (2008) Variations in DNA elucidate molecular networks that cause disease. Nature 452:429–35. • Lee SI, Dudley A, Drubin D, Silver P, Krogan N, et al. (2009) Learning a prior on regulatory potential from eQTL data. PLoS Genetics 5:e1000358. • Emilsson V, Thorleifsson G, Zhang B, Leonardson A, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452:423–28.
Multiple-Trait Association: Dependencies in Phenome Association with Phenome Traditional Approach causal SNP ACGTTTTACTGTACAATT ACGTTTTACTGTACAATT a univariate phenotype: i.e., disease/control, gene expression level Multivariate complex syndrome (e.g., asthma) age at onset, history of eczema genome-wide expression profile
Multiple-trait Association: Graph-Constrained Fused Lasso Step 1: Thresholded correlation graph of phenotypes Step 2: Graph-constrained fused lasso ACGTTTTACTGTACAATT Fusion Lasso Penalty Graph-constrained fusion penalty
Multiple-trait Association: Graph-Weighted Fused Lasso Step 1: Thresholded correlation graph of phenotypes with weights Step 2: Graph-weighted fused lasso ACGTTTTACTGTACAATT Weighted Fusion Lasso Penalty Graph-constrained fusion penalty
Estimating Parameters (Association Strength) • Quadratic programming formulation • Graph-constrained fused lasso • Graph-weighted fused lasso • Many publicly available software packages for solving convex optimization problems can be used
Simulation Results Phenotypes Trait Correlation Matrix Thresholded Trait Correlation Network • 50 SNPs taken from HapMap chromosome 7, CEU population • 10 traits SNPs True Regression Coefficients Single SNP-Single Trait Test Graph-constrained Fused Lasso Graph-weighted Fused Lasso Ridge Regression Lasso
Results from Association Phenotypes Phenotypes • Lung physiology-related traits II • Percent difference in FEV1: Spirometry • Post FEF 25-75 value • Postbronchodilation FEV1, % pred: Spirometry • Baseline FEV1, liters: Spirometry • Baseline predrug FEV1, liters • Maximum FEV1, liters: MPVLung • Baseline predrug FEV1, liters Trait Correlation Matrix Trait Network SNPs ? ? Single-Marker Single-Trait Test Graph-weighted Fused Lasso Graph-constrained Fused Lasso Lasso
Linkage Disequilibrium Structure in IL-4R gene SNP rs3024622 SNP rs3024660 r2 =0.07 r2 =0.64 SNP Q551R
Conclusions • Summary • Dependencies in phenome: Graph-guided fused lasso framework incorporates correlation information among traits to detect pleiotropic effect of genotypic variations. • Analysis of the asthma dataset suggests the effectiveness of the method • Future Work • Dependencies in genome?: Poster Q06 (This evening) • Dependencies in both genome and phenome • Learn the trait correlation network and association strengths jointly • Availability: http://www.sailing.cs.cmu.edu/