Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak

Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on Tackling Computer Systems Problems with Machine Learning Techniques )‏ Presented By Hassan Wassel

Introduction • System logs is a critical tool for system administrators. • They are massive in amount • We need to rank them according to importance. • Previous work: • Ranking using expert rules • Visualization • One machine log

What is Important? • This paper propose that an important message is the message appears in a probability higher than the expected. • Represent messages of the same type by one message type. • Calculate the empirical distribution of probabilities and rank them. • Systems are not homogeneous.

Algorithm • Using K-means clustering to divide system logs into classes. • Estimate the empirical distribution of each class. • Given a system log, identify a class and rank messages according to its P

Clustering • K-Means tries to minimize an objective function J=Sum j Sum i d2(Xi, Zj)‏ • Inputs: • Number of Clusters • Distance Matrix • Outputs: • Membership matrix • Objective function value Features Patterns Clusters Patterns

Dimensionality Problem • The data was 3000 system log with 15,000 message type. However, it is sparse • Distance measurement using these 15,000 feature is computationally intensive. • Solution: Dimensionality reduction

Feature Construction • Using Spearman Correlation between every two system logs • Corr(x,y) = 1 – (6 || rx – ry||2)/(N(N-1))‏ • From k logs X n message types to k X k similarity matrix. • Question: How to calculate rank vectors?

Evaluation • Compare Spearman Correlation to other feature construction schemes. • Histogram of Pairwise distance • Maximal Mutual Information • Improvement in Score

Comment • Future Work • Correlation based clustering • Feature extraction + choice of distance measure • Bi-clustering • Fuzzy Clustering • Evaluation • Use of human expertise to evaluate the ranking. • Clustering index

Thank you! Pros and Cons!

Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak