280 likes | 290 Views
This research focuses on reordering rows and columns of a data matrix to display submatrices effectively. Explore the visualization cost, hypergraph ordering problem, and leveraging MLA for optimal visualization results.
E N D
Overlapping Matrix Pattern Visualization: a Hypergraph Approach Ruoming Jin Kent State University Joint with Yang Xiang, David Fuhry, and Feodor F. Dragan (KSU)
The Problem • Given a set of discovered submatrices, how can we reorder the rows and columns of the data matrix to best display these submatrices and their relationship?
Motivation: Overlapping Bicluster Visualization • Gene expression profiles (row: genes, columns: conditions, matrix entry: expression level) • Biclustering: homogeneous submatrices (genes conditions) • Biclustering visualization problem [GMM06, KG07]
i1 i2 i8 i9 t1 i1 i2 i3 i4 i5 i6 i7 i8 i9 t2 {t1,t2,t7,t8}X{i1,i2,i8,i9} t7 t1 t8 t2 i4 i5 i6 t3 t4 t4 {t4,t5}X{i4,i5,i6} t5 t5 t6 i2 i3 i7 i8 t7 t2 t8 t3 {t2,t3,t6,t7}X{i2,i3,i7,i8} t6 t7 Motivation: Transactional Data Visualization • Shopping-basket data (rows: transaction, columns: item, binary matrix) • Transactional data summarization using a set of dense submatrices [CK07, WK06, XJFD08] Summarization Cost=8+8+5=21
Roadmap • Problem Definition • Visualization cost • Hardness of the visualization problem • Hypergraph ordering problem • Minimum linear arrangement (MLA) • Algorithm • Leveraging MLA and local convergence • Experimental Results
i1 i2 i3 i4 i5 i6 i7 i8 i9 t1 t2 t3 t4 t5 t6 t7 t8 Submatrix Visualization Cost • Given a display of the matrix (a fixed row-order and column-order), how can we measure the goodness of “visualization” of a submatrix? {t1,t2,t7,t8}X{i1,i2,i8,i9} {t1,t2,t7,t8}X{i1,i2,i8,i9} i1 i2 i8 i9 i3 i7 i4 i5 i6 t1 t8 t2 t7 t3 t6 t4 t5 Why the second one is intuitively better than the second one?
i1 i2 i3 i4 i5 i6 i7 i8 i9 t1 t2 t3 t4 t5 t6 t7 t8 Submatrix Visualization Cost {t1,t2,t7,t8}X{i1,i2,i8,i9} {t1,t2,t7,t8}X{i1,i2,i8,i9} • Area: 8x8, 6x6, 4x4, 4x4 • Perimeter: 8+8, 6+6, 4+4, 4+4 • Given a row order and a column order, the visualization cost of a submatrix is the sum of • difference between its first and last row w.r.t. the row order • difference between its first and last column w.r.t. the column order i1 i2 i8 i9 i3 i7 i4 i5 i6 t1 t8 t2 t7 t3 t6 t4 t5
Matrix Visualization Cost • Given a row order and a column order, and a set of submatrices, the matrix visualization cost is the sum of these submatrices’ visualization cost. • Matrix Optimal Visualization Problem: • Find the optimal row order and column order such that the matrix visualization cost is minimal.
Roadmap • Problem Definition • Visualization cost • Hardness of the visualization problem • Hypergraph ordering problem • Minimal linear arrangement (MLA) • Algorithm • Leveraging MLA and Local convergence • Experimental Results
Hypergraph Ordering • Hypergraph HG=(V,X), • V is the set of vertices • X={x1,x2,…,} is the set of hyperedges, where each hyperedge is the set of vertices • Hyperedge cost and Hypergraph cost • Hypergraph Ordering Problem Hyperedge {0,2,3,4} cost = 4 0 1 2 3 4 5 6 Hypergraph cost=16 Hyperedge {1,3,5} cost = 4
i1 i2 i3 i4 i5 i6 i7 i8 i9 i4 i3 i8 i1 t1 i5 t2 HG1 i9 i2 i7 i6 t3 t4 t5 t3 t2 t6 t1 t4 HG2 t7 t5 t8 t6 t8 t7 The Link between Matrix Visualization and Hypergraph Ordering • Relationship between matrix visualization cost and hypergraph cost • Finding minimum visualization (or hypergraph) cost is NP-hard
Graph cost w.r.t. a vertex order MLA (Minimal Linear Arrangement): Find an optimal vertex ordering to minimize graph cost Hypergraph Ordering Problem is the Generalization of MLA 0 1 2 3 4 5 6 Graph cost=2+2+2*1+1+4+3+2=16 0 1 2 5 4 3 6 Graph cost=2+4+2*3+4+2+1+1=18
Roadmap • Problem Definition • Visualization cost • Hardness of the visualization problem • Hypergraph ordering problem • Minimal linear arrangement • Algorithm • Leveraging MLA and Local convergence • Experimental Results
Basic Idea for Hypergraph Ordering • Many existing work on solving MLA problem (heuristic or bounded-approximation) • Instead of working from scratch for the hypergraph ordering problem, can we somehow leverage the MLA algorithms? • The answer is YES!
Basic Procedure Given the hypergraph HG=(V,X), and starts with a random vertex order : • Step 1: Transforming the hypergraph HG into a graph G=(V,E) based on the vertex order ; • cost(HG, )=cost(G, ) • Step 2: Run MLA algorithm for graph G to produce a new optimal vertex order ’ • cost(G, ) cost(G, ’) • Step 3: If the new order improve the hypergraph cost, cost(HG, ) > cost(HG, ’), then use ’ as the new order (= ’), and repeat Step 1 and 2. • cost(G, ’) cost(HG, ’) Cost(HG, )=cost(G, )cost(G, ’)cost(HG, ’)
(Step1) Transformation: Hyperedge->Path 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Hyperedge cost=path cost!
Step 1->Step 2 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Step 1 (Hypergraph->Graph): cost(G, )=2+2+2*1+1+4+3+2=16=cost(HG, ) 0 2 3 5 6 4 1 Step 2 (MLA): cost(G, ’)=1+2+2*1+2+1+2+3=13<cost(G, )
Step 1->Step 2->Step 3 0 1 2 3 4 5 6 0 2 3 5 6 4 1 Step 1 (Hypergraph->Graph): cost(G, )=cost(HG, )=16 Step 2 (MinLA): cost(G, ’)=13<cost(G, ) 0 2 3 5 6 4 1 0 2 3 5 6 4 1 With the new ordering, hyperedge costpath cost!
Step 1->Step 2->Step 3 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 2 3 5 6 4 1 Step 1 (Hypergraph->Graph): cost(G, )=cost(HG, )=16 Step 2 (MinLA): cost(G, ’)=13<cost(G, ) 0 2 3 5 6 4 1 Step 3: cost(HG, ’)=10<cost(G, ’)=13 Cost(HG, )=cost(G, )>cost(G, ’)>cost(HG, ’)
Other conversions of hyperedge • Converting hyperedge to cycle • Converting hyperedge to mulicycles
Roadmap • Problem Definition • Visualization cost • Hardness of the visualization problem • Hypergraph ordering • Algorithm • Minimum linear arrangement (MLA) • Leveraging MLA and local convergence • Experimental Results
Conclusion • We found an interesting link from matrix visualization problem to a well-know graph theoretical problem: the minimal linear arrangement (MLA) problem. • Theoretically, we introduce a generalization of the MLA problem for the hypergraphs, and develop a novel local convergence algorithm • Our method can be incorporated into an interactive visualization environment to allow users to focus on different parts of the data and patterns.