200 likes | 387 Views
Graph Regularized Dual Lasso for Robust eQTL Mapping. Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei Wang 4 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of Science and Technology of China, 4 University of California, Los Angeles.
E N D
Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng1 Xiang Zhang2 Zhishan Guo1 Yu Shi3 Wei Wang4 1University of North Carolina at Chapel Hill, 2Case Western Reserve University, 3University of Science and Technology of China, 4University of California, Los Angeles Speaker: Wei Cheng The 22thAnnual International Conference on Intelligent Systems for Molecular Biology (ISMB’14)
eQTL (Expression QTL) • Goal: Identify genomic locations where genotype significantly affects gene expression.
Statistical Test • Partition individuals into groups according to genotype of a SNP • Do a statistic (t, ANOVA) test • Repeat for each SNP individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 7 12 11 9 13 6 4 2 5 0 3 9 8 1 0 8 5 2 1 0 8 6 2 . . . . . . . . . . . . Gene expression level 4 8 12 SNPs (X) 0 SNP1 1 Gene expression levels (Z)
Lasso-based feature selection • X: the SNP matrix (each row is one SNP) • Z: the gene expression matrix (each row is one gene expression level) • Objective:
Incorporating prior knowledge • SNPs (and genes) usually are not independent • The interplay among SNPs and the interplay among genes can be represented as networks and used as prior knowledge • Prior knowledge: genetic interaction network, PPI network, gene co-expression network, etc. • E.g., group lasso, multi-task, SIOL, MTLasso 2G.
Limitations of current methods • A clustering step is usually needed to obtain the grouping information. • Do not take into consideration the incompleteness of the prior knowledge and the noise in them • E.g., PPI networks may contain many false interactions and miss true interactions • Other prior knowledge, such as location and gene pathway information, are not considered.
Motivation • Examples of prior knowledge on genetic interaction network S and gene-gene interactions represented by PPI network (or gene co-expression network G).W is the regression coefficients to be learned.
GD-Lasso: Graph-regularized Dual Lasso • Objective: Lasso objective considering confounding factors (L), ||L||* is the nuclear norm to control L as low-rank. The graph regularizer The fitting constraint for prior knowledge
GGD-Lasso: Generalized Graph-regularized Dual Lasso • Further incorporating location and pathway information. • Objective: D(·, ·) is a nonnegative distance measure.
GGD-Lasso: Optimization • Executes the following two steps iteratively until the termination condition is met: • 1) update W while fixing S and G; • 2) update S and G according to W, while decreasing: • and • We can maintain a fixed number of edges in S and G. E.g., to update G, we can swap edge (i’, j’) and edge (i,j) when • Further integrate location and pathway information
Experimental Study: simulation • 10 gene expression profiles are generated by ~ ~ ~
Experimental Study: simulation The ROC curve. The black solid line denotes what random guessing would have achieved.
Experimental Study: simulation AUCs of Lasso, LORS, G-Lasso and GD-Lasso. In each panel, we vary the percentage of noises in the prior networks S0 and G0.
Experimental Study: Yeast • yeast eQTL dataset • 112 yeast segregants generated from a cross of two inbred strains: BY and RM; • removing those SNP markers with percentage of NAs larger than 0.1 (the incomplete SNPs are imputed), and merging those markers with the same genotypes, dropping genes with missing values; • get 1017 SNP markers, 4474 expression profiles; • Genetic interaction network and PPI network (S and G)
Experimental Study: Yeast • cis-enrichment analysis • (1) one-tailed Mann-Whitney: test on each SNP for cis hypotheses; • (2) a paired Wilcoxon sign-rank: test on the p-values obtained from (1). • trans-enrichment: • Similar strategy: genes regulated by transcription factors (TF) are used as trans-acting signals.
Experimental Study: Yeast Pairwise comparison of different models using cis-enrichment and trans-enrichment analysis
Experimental Study: Yeast Summary of the top-15 hotspots detected by GGD-Lasso. Hotspot (12) in bold cannot be detected by G-Lasso. Hotspot (6) in italic cannot be detected by SIOL. Hotspot (3) in teletype cannot be detected by LORS.
Experimental Study: Yeast Hotspots detected by different methods
Conclusion • In this paper… • We propose novel and robust graph regularized regression models to take into account the prior networks of SNPs and genes simultaneously. • Exploiting the duality between the learned coefficients and incomplete prior networks enables more robust model. • We also generalize our model to integrate other types of information, such as location and gene pathway information.
Thank You ! Questions? Travel funding to ISMB 2014 was generously provided by DOE