Mijung Kim (Arizona State University) K. Selçuk Candan (Arizona State University)

Decomposition-by-Normalization (DBN): Leveraging Approximate Functional Dependencies for Efficient Tensor Decomposition Mijung Kim (Arizona State University) K. SelçukCandan (Arizona State University) This work is supported by an NSF Grant #1043583 -‘MiNC: NSDL Middleware for Network- and Context-aware Recommendations’ and the NSF Grant #1116394 `RanKloud: Data Partitioning and Resource Allocation Strategies for Scalable Multimedia and Social Media Analysis

Tensor Decomposition • Tensor is a high-dimensional array • Tensor decomposition is widely used for multi-aspect data analysis for multi-dimensional data

High cost of tensor decomposition • Data is commonly high-dimensional and large-scale • Dense tensor decomposition • The cost increases exponentially with the number of modes of the tensor. • Sparse tensor decomposition • The cost increases more slowly (linearly with the number of nonzero entriesin the tensor) • But still be very expensive for large data sets. • Parallelization for ALS method faces difficulties such as communication cost. • How do we tackle this high computational cost of tensor decomposition?

Normalization • Reduce the dimensionality and the size of the input tensor • based on functional dependencies (FD) of the relation (tensor)

Join-by-Decomposition [Kim and Candan 2011] Step 1a: Decomposition of (user, movie, rating) relation Step 1b: Decomposition of (movie, genre) relation Step 2: Combination of the two decompositions into a final decomposition • Find all rank-R1 and rank-R2 decompositions of the two input tensors, where R1 × R2 = R and choose one pair where two decompositions are as independent from each other as possible. M. Kim and K. S. Candan. Approximate tensor decomposition within a tensor-relational algebraic framework. In CIKM, 2011.

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor)

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor) Normalization based on functional dependencies (vertical partitioning)

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor) Normalization based on functional dependencies (vertical partitioning) Lower-dimensional data sets (two 3-mode tensors)

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor) Normalization based on functional dependencies (vertical partitioning) Lower-dimensional data sets (two 3-mode tensors) Tensor decomposition on each vertical partition (sub-tensor)

Decomposition-by-Normalization (DBN) High-dimensional data set (5-mode tensor) Normalization based on functional dependencies (vertical partitioning) Lower-dimensional data sets (two 3-mode tensors) Tensor decomposition on each vertical partition (sub-tensor) Combined into the decomposition of the original data set (tensor)

Task 1: Normalization Process • Select an attribute X which functionally determines the other attributes (X A) • to prevent from introducing spurious data. • An efficient method is needed to determine functional dependencies in the data. • Because total number of functional dependencies in the data can be exponential. • We employ TANE [Huhtala et al. 1999] that finds a set of (approximate) pair-wise FDs, which is linear in the size of input, which is trivial compared to the decomposition cost. Y. Huhtala et al. TANE: An ecient algorithm for discovering functional and approximate dependencies. Comput. J., 42 (2):100-111, 1999.

Task 2: Find Approximate FD • Many data sets may not have perfect FDs to leverage for normalization • Thus we rely on approximate FDs in the data with support (the minimum fraction of tuples that must be removed for FDs to hold)

Task 3: Partitioning • Partition the data into two partitions that will lead to least amount of errors. • Find the partitions as independent from each other as possible. • minimize inter-partition (between partitions) pair-wise FDs • maximize intra-partition (within partitions) pair-wise FDs

Parallelized DBN • We parallelize the entire DBN operation by associating each pair of rank decompositions to an individual processor core Rank-1×Rank-12 Rank-2×Rank-6 Rank-3×Rank-4 Rank-4×Rank-3 Rank-6×Rank-2 Rank-12×Rank-1 Rank-12 • Each paircan run in a separate core in a parallel manner.

Desiderata • The vertical partitioning should be s.t.: • Approx. FDs need to have high support to prevent over-thinning of the relation R. • Case 1: join attribute X determines only a subset of the attributes of the relation R) (|R|=|R2|, |R1|<=|R2|) • For dense tensors, the number of attributes in each partition should be balanced • For sparse tensors, the total number of tuples of R1 and R2 are minimized • Case 2: join attribute X determines all attributes of the relation R (|R|=|R1|=|R2|) • The support for the inter-partition FDs are minimized. (For dense tensors, the partitions should be balanced.)

Vertical Partitioning Strategies • Partition with all the attributes determined with a support higher than the threshold (support) by the join attribute.

Vertical Partitioning Strategies (Case 1: join attribute X determines only a subset of the attributes of the relation R (|R|=|R2|, |R1|<=|R2|)) • Sparse tensors • The size of R1 (X and all determined attributes) can be minimizeddown to the number of unique values of X by eliminating all the duplicate tuples. • Dense tensors • Promote balanced partitioningby relaxing or tightening the support threshold. • If # attr. of R2 > # attr. of R1, move the attributes with the highest support of R2 to R1 (relaxing) or if # attr. of R1 > # attr. of R2, move the attributes with the lowest support of R1 to R2 (tightening).

Vertical Partitioning Strategies(Case 2: join attribute X determines all attributes of the relation R) • We formulate the interFD-based partitioning as a graph partitioning problem. • pairwise FD graph, Gpfd(V, E), where each vertex represents an attribute and the weight of the edge the average support of the approximate FDsbetween the attr. • The problem is then to locate a cut on Gpfdwith the minimum average weight. (For dense tensors, balance criterion is imposed) • We use a modified version of a minimum cut algorithm [Stoer and Wagner 1997] to seek a minimum average cut. M. Stoer and F. Wagner. A simple min-cut algorithm. J. ACM, 44 (4):585-59, 1997

Rank Pruning based on Intra-Partition Dependencies • The higher the overall dependency between the attributes in a partition, the smaller should be the decomposition rank of that partition. • Thus, we only consider rank pairs (r1, r2) s.t. r1 < r2 if intra-partition FD support for R1 is larger than the support for R2, and vice versa.

Experimental Setup (Data Sets) • UCI Machine Learning Repository [Frank and Asuncion 2010] A. Frank and A. Asuncion. UCI Machine Learning Repository. Irvine, CA: U. of California, School of ICS, 2010.

Experimental Setup (Algorithms) • NNCP (Non-Negative CP) vs. DBN • Dense tensor [N-way Toolbox 2000] • NNCP-NWAY vs. DBN-NWAY • Sparse tensor [MATLAB Tensor Toolbox 2007] • NNCP-CP vs. DBN-CP • With parallelization • NNCP-NWAY/CP-GRID2,6 [Phan and Cichocki 2011] vs. pp-DBN-NWAY/CP • DBN with intraFD-based rank pruning • DBN2,3 (2 pairs or 3 pairs selection) C. A. Andersson and R. Bro. The N-way Toolbox for MATLAB. Chemometr. Intell. Lab., 52(1):1-4, 2000. B. W. Bader and T. G. Kolda. MATLAB Tensor Toolbox Ver. 2.2, 2007. A. H. Phan and A. Cichocki. PARAFAC algorithms for large-scale problems. Neurocomputing, 74(11):1970-1984, 2011.

Experimental Setup (rank) • rank-12 decomposition • DBN uses 6 combinations (1×12, 2×6, 3×4, 4×3, 6×2, and 12×1)

Experimental Setup (H/W and S/W) • H/W • 6 cores Intel(R) Xeon(R) CPU X5355 @ 2.66GHz with 24GB of RAM. • S/W • MATLAB Version 7.11.0.584 (R2010b) 64-bit (glnxa64) for the general implementation • MATLAB Parallel Computing Toolbox for the parallel implementation of DBN and NNCP

Key Results: Running Time (Dense Tensor) (Case 1: join attribute X determines only a subset of the attributes of the relation R (|R|=|R2|, |R1|<=|R2|)) NNCP vs. DBN with parallelization NNCP vs. DBN

Key Results: Running Time (Sparse Tensor) • (Case 1: join attribute X determines only a subset of the attributes of the relation R (|R|=|R2|, |R1|<=|R2|)) NNCP vs. DBN and DBN2,3 (DBN2,3: DBN with intraFD-based rank pruning) NNCP vs. DBN with parallelization

Key Results: Running Time(Case 2: join attribute X determines all attributes of the relation R (|R|=|R1|=|R2|)) Sparse Tensor NNCP vs. DBN3 with parallelization Dense Tensor NNCP vs. DBN3 with parallelization NOTE: In both cases, most of data points are located under the diagonal, which indicates that DBN outperforms NNCP.

Key Results: Accuracy NOTE: THE HIGHER THE BETTER!!!!!!!

InterFD-based vertical partitioning Note: The higher the closer to the optimal partitioning strategy!

intraFD-based rank pruning strategy Note: The higher the better intraFD-based rank pruning works!

Conclusions • Lifecycle of data requires capture, integration, projection, decomposition, and data analysis. • Tensor decomposition is a costly operation. • We proposed: • highly efficient, effective, and easily parallelizable decomposition-by-normalizationstrategy for approximately evaluating decompositions • interFD-based partitioning • intraFD-based rank pruning strategies

Mijung Kim (Arizona State University) K. Selçuk Candan (Arizona State University)