1 / 20

An Efficient Algorithm for a Class of Fused Lasso Problems

This paper presents an efficient algorithm for solving a class of Fused Lasso problems, focusing on sparse learning for feature selection. The proposed algorithm combines smooth and non-smooth techniques to enhance computational speed and accuracy, particularly suited for genomic data analysis. The study outlines the Fused Lasso penalty and its applications in genomic data arrays and unordered data like leukaemia samples. By employing novel subgradient finding algorithms, the proposed approach achieves significant performance improvements compared to existing methods. The research concludes with suggestions for future work, including extending the algorithm to multi-dimensional Fused Lasso problems and applications in time-varying network learning.

casher
Download Presentation

An Efficient Algorithm for a Class of Fused Lasso Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient Algorithm for a Class of Fused Lasso Problems Jun Liu, Lei Yuan, and Jieping Ye Computer Science and Engineering The Biodesign Institute Arizona State University

  2. Sparse Learning for Feature Selection features min loss(x) + penalty(x) Data points x Convex function defined on a set of training samples Our prior assumption on the parameter x

  3. Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion

  4. The Fused Lasso Penalty(Tibshirani et al., 2005; Tibshirani and Wang, 2008; Friedman et al., 2007) Fused Lasso A solution that is sparse in both the parameters and their successive differences Lasso

  5. ArrayCGH Data arrayCGH: array-based Comparative Genomic Hybridization (Tibshirani and Wang, 2007) ratio= # DNA copies of the gene in the tumor cells # DNA copies in the reference cells piecewise constant shape of copy number changes copy number alterations: 1. large chromosome segmentation gain/loss 2. abrupt local amplification/deletion

  6. Unordered Data leukaemia data (Tibshirani et al., 2005) 7129 genes and 38 samples: 27 in class 1 (acute lymphocytic leukaemia) 11 in class 2 (acute myelogenous leukaemia) Hierarchical clustering can be used to estimate an ordering of the genes, putting correlated genes near one another in the list. (Tibshirani et al., 2005)

  7. Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion

  8. The Fused Lasso Penalized Problem • Smooth convex loss functions: • Least squares loss (Tibshirani, 2005) • Logistic loss (Ahmed and Xing, 2009) non-smooth and non-separable Smooth reformulation (auxiliary variables and constraints) + general solver (e.g., CVX) “One difficulty in using the fused lasso is computational speed, …, speed could become a practical limitation. This is especially true if five or tenfold cross-validation is carried out.” (Tibshirani et al., 2005)

  9. Efficient Fused Lasso Algorithm (EFLA) Smooth NonSmooth Accelerated gradient descent (Nesterov, 2007; Beck and Teboulle, 2009), which has convergence rate of O(1/k2) for k iterations FLSA is called in each iteration of EFLA Fused Lasso Signal Approximator(FLSA, Friedman et al., 2007)

  10. Subgradient Finding Algorithm (1) Fused Lasso Signal Approximator (FLSA) simplified problem

  11. Subgradient Finding Algorithm (2) (P) primal relationship (D) dual duality gap

  12. Subgradient Finding Algorithm (3) (D) dual v is of dimensionality n=105. Its entries are from the standard normal distribution. SFAG: SFA via Gradient Descent SFAN: SFA via Nesterov’s method SFAR: SFA via the restart technique SFAG and SFAN achieve duality gaps of 1e-2 and 1e-3 in 200 iterations, respectively. With the restart technique, we can achieve exact solution in 8 iterations

  13. Outline Fused Lasso and Applications Proposed Algorithm Experiments Conclusion

  14. Illustration of SFA (P) less than 1e-12 (D) v is of dimensionality n=105. Its entries are from the standard normal distribution 300.2

  15. Efficiency of SFA We compare with the recently proposed pathFLSA (Hofling, 2009) Much more efficient than pathFLSA (over 10 times for most cases)

  16. Efficiency of EFLA(Comparison with the CVX solver) EFLA: our proposed Efficient Fused Lasso Algorithm, with efficient SFA CVX: smooth reformulation via CVX • arrayCGH: 57 samples of dimensionality 2,385 • Prostate cancer: 132 samples of dimensionality 15,154 • Leukaemia (unordered): 72 samples of dimensionality 7,129 • Leur (reordered via Hierarchical clustering): 72 samples of dimensionality 7,129

  17. Classification Performance • Significant performance improvement on Array CGH • Comparable results on the prostate cancer data set • Hierarchical clustering can help improve the performance

  18. SLEP: Sparse Learning with Efficient Projections Liu, Ji, and Ye (2009) SLEP: A Sparse Learning Package http://www.public.asu.edu/~jye02/Software/SLEP/

  19. Conclusion and Future work Contributions: An efficient algorithm for the class of fused Lasso problem Subgradient finding algorithm with a novel restart technique Future work: Extend the algorithm to the multi-dimensional fused Lasso Apply the proposed algorithm for learning time-varying network

  20. Thank you!

More Related