Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection Hanghang Tong and Ching-Yung Lin SIAM-DM 2011, Mesa AZ, USA, April 28-30, 2011

Large Graphs are Everywhere! ----- • Q: How to find patterns? • e.g., community, anomaly, etc. Terrorist Network [Krebs2002] Food Web [2007] Internet Map [Koren 2009] Social Network [Newman 2005] Protein Network [Salthe2004] Web Graph

Matrix Tool for Finding Graph Patterns • A Typical Procedure: Residual matrix Low-rank matrices Adj. Matrix A Graph A = F x G + R 3

Matrix Tool for Finding Graph Patterns • A Typical Procedure: Residual matrix Low-rank matrices Adj. Matrix A Graph A = F x G + R anomalies community An Illustrative Example 4

Improve Interpretation by Non-negativity • A Typical Procedure: • An Example Interpretation by Non-negativity community Non-negative Matrix Factorization F >= 0; G >= 0 (for community detection) Adjacency Matrix A A = F x G + R Graph anomalies Non-negative Residual Matrix Factorization R(i,j) >= 0; for A(i,j) > 0 (for anomaly detection) This Paper 5

Anomaly Detection on Graphs • Social Networks • `Popularity contest’ • Computer Networks • Spammer, Port Scanner, Vulnerable Machines, etc • Financial Transaction Networks • Fraud transaction (e.g., money-laundry ring), scammer • Criminal Networks • New criminal trend • Tele-communication Networks • Tele-marketer Key Observation: Abnormal Behavior  Actual Activities

Optimization Formulation Weighted Frobenius Form Common in Any Matrix Factorization Weight • General Case 8

Optimization Formulation Weighted Frobenius Form Common in Any Matrix Factorization Weight Unique in This Paper Non-negative residual • General Case 9

Optimization Formulation • 0/1 Weight Matrix (Major Focus of the Paper) 0/1 weight Common in Any Matrix Factorization Unique in This Paper Non-negative residual

Optimization Formulation with 0/1 Weight Matrix • NrMF with 0/1 Weight Matrix • Q: How to find ‘optimal’ F and G? • D1: Quality  C1: non-convexity of opt. objective • D2: Scalability  C2: large size of the graph 11

Optimization Method: Batch Mode • Basic Idea 1: Alternating • Basic Idea 2: Separation Not convex wrt F and G, jointly But convex if fixing either F or G argminG s.t.. argminG s.t.. i, For each j Standard Quadratic Programming Prob. Overall Complexity: Polynomial  Can we do better? 12

Optimization Method: Incremental Mode Adjacency Matrix A • Basic Idea 1: Recursive • Basic Idea 2: Alternating • Basic Idea 3: Separation Initialize: R=A Rank-1 Approximation Do r times QP for a single variable w/ boundary constrains Update Residual Matrix R Can be solved in constant time Output Final Residual Matrix Overall Complexity: Linear wrt # of edges 13

Experimental Evaluation Effectiveness Efficiency Accuracy Wall-clock Time Anomaly Type # of edges 14

Batch Method vs. Incremental Method Log Wall-clock time (sec.) Batch Method Incremental Method Data Set 16

Conclusion • Problem Formulation: Non-negative Residual Matrix Factorization • a new matrix factorization for interpretable graph anomaly detection • Optimization Methods • Batch: straight-forward, polynomial time complexity • Incremental: linear time complexity • Future Work • Other interpretable properties (sparseness) for anomaly detection • Matrix Factorization w/ Total Non-negativity 17

Thank you! htong@us.ibm.com (We are hiring at IBM Research!) 18

Visual Comparison 19

low q up q low up

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection