1 / 25

Mining Discrete Patterns via Binary Matrix Factorization

Mining Discrete Patterns via Binary Matrix Factorization. Jieping Ye Arizona State University. Joint work with Baohong Shen and Shuiwang Ji. Rank-One Binary Matrix Factorization. compression, clustering, pattern discovery. features. 0001110…….1110110 0111000…….0001010 0011010…….1110110.

tanith
Download Presentation

Mining Discrete Patterns via Binary Matrix Factorization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Discrete Patterns via Binary Matrix Factorization Jieping Ye Arizona State University Joint work with Baohong Shen and Shuiwang Ji

  2. Rank-One Binary Matrix Factorization compression, clustering, pattern discovery features 0001110…….1110110 0111000…….0001010 0011010…….1110110 samples …. 1 0 1 1 1 0 10110101110110 01110000000110 00110101110110 00110101110110 00110101110111 00000111101010 00110101110110 dominant pattern indicator vector

  3. Application I: Image Compression 0001110…….1110110 0001110…….1110110 0111000…….0001010 0011010…….1110110 0111000…….0001010 …. 0011010…….1110110 Binary Matrix …. ….

  4. An Example of Tree for 45 images from Stage Range 4-6Built byOur Algorithm Application II: Hierarchy Construction

  5. An Example of Tree for 45 images from Stage Range 4-6Built byOur Algorithm Application III: Pattern Discovery M. Koyuturk, A. Grama, and N. Ramakrishnan, Compression, clustering and pattern discovery in very high dimensional discrete-attribute datasets, IEEE TKDE, 2005.

  6. Binary Rank-One Approximation: Problem Formulation

  7. Binary Rank-One Approximation: Challenges • Can we compute an approximate solution with a guaranteed error bound? • Can we compute it efficiently? • Conjectured to be NP-Hard. • Existing approach based on the iterative updating • Koyutürk, M. & Grama, A. PROXIMUS: A framework for analyzing very high dimensional discrete-attributed datasets. KDD'03. • Heuristics, without known guarantees on approximation errors. • It very often results in undesirable rank-one approximations.

  8. Regularized Binary Rank-One Approximation

  9. Equivalent Reformulation Maximum Weight Problem (MWP):

  10. Our Main Contributions An exact formulation for MWP, using integer linear programming. A formulation for error-bounded integer linear programming, using integer linear programming. The proof of an error bound . Efficient algorithms to solve the error-bounded approximation.

  11. Overview • This is the first polynomial time algorithm that computes an approximate solution with a guaranteed error bound. reformulation Binary Rank-one Matrix Approximation Maximum Weight Problem (MWP) reformulation error-bounded approximation • This is the first work that explicitly connects binary matrix factorization and minimum s-t cut. Integer Linear Programming (ILP1) Integer Linear Programming (ILP2) LP relaxation reformulation minimum s-t cut problem Linear Programming Relaxation of ILP2

  12. Formulation for Exact Solutions Notation: Integer linear programming formulation: equivalent Original formulation: • If x1i = x2j=1, then zi,j≤1. • Ui,j >0zi,j=1. • If one of x1i and x2j is o, then zi,j ≤0.5. • zi,j is an integer zi,j=0.

  13. Formulation for Approximate Solutions I

  14. Formulation for Approximate Solutions II Proposition: The objective value of ILP2 is no less than that of ILP1 for the same problem instance.

  15. Approximation Error ILP2 achieves an error-bounded approximation. Approximate objective Approximate bound Optimal objective

  16. Linear Programming Relaxation of ILP2 • Proposition: The coefficient matrix of the constraints in ILP2 • is totally unimodular. • I. Heller and C. B. Tompkins. An extension of a theorem of Dantzig's. • Ann. of Math. Stud., no. 38, pages 247-254. 1956. • We can obtain an exact solution of ILP2 by solving its LP relaxation. • LP is still computationally expensive for a large matrix A.

  17. Overview reformulation Binary Rank-one Matrix Approximation Maximum Weight Problem (MWP) reformulation error-bounded approximation Integer Linear Programming (ILP1) Integer Linear Programming (ILP2) LP relaxation reformulation minimum s-t cut problem Linear Programming Relaxation of ILP2

  18. Generalized Independent Set Problem Generalized Independent Set Problem (GIS) An undirected graph G=(V,E), A nonnegative weight w(v) for each vertex v in V, A nonnegative penalty p(e) for each edge e in E. GIS Problem: find a vertex subset S in V

  19. Transform ILP2 into a GIS Problem • ILP2 defines an instance of GIS, and the corresponding graph is bipartite.

  20. Efficient Approximation GIS is NP-Hard for general graphs. However, it can be solved in polynomial time for bipartite graphs. GIS for bipartite graphs can be solved by solving minimum s-t cuts / maximum flows. Hochbaum, D. S. & Pathria, A. Forest harvesting and minimum cuts: a new approach to handling spatial constraints, Forest Science, 1997, 43, 544-554

  21. Experimental Evaluation: Error Bound We present results by the minimum s-t cut (P1), the improvement by iterative updating (P2), and theoretical upper bounds.

  22. Experimental Evaluation: Error Bound We present results by the minimum s-t cut (P1), the improvement by iterative updating (P2), and theoretical upper bounds.

  23. Experimental Evaluation: Running Time One dimension is fixed at 1000.

  24. Conclusion reformulation Binary Rank-one Matrix Approximation Maximum Weight Problem (MWP) reformulation error-bounded approximation Integer Linear Programming (ILP1) Integer Linear Programming (ILP2) LP relaxation reformulation minimum s-t cut problem Linear Programming Relaxation of ILP2

  25. Thank you!

More Related