1 / 24

One Graph is worth a Thousand Logs Uncovering Hidden Structures in Massive System Event Logs

One Graph is worth a Thousand Logs Uncovering Hidden Structures in Massive System Event Logs. Gilad Barash Co-Authors: Ira Cohen, Michal Aharon, Eli Mordechai. Event Logs in the IT environment. MANY LOGS Millions of events Per DAY. IT.

janna
Download Presentation

One Graph is worth a Thousand Logs Uncovering Hidden Structures in Massive System Event Logs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. One Graph is worth a Thousand LogsUncovering Hidden Structures in Massive System Event Logs Gilad Barash Co-Authors: Ira Cohen, Michal Aharon, Eli Mordechai

  2. Event Logs in the IT environment MANY LOGS Millions of events Per DAY IT • Each System component writes to its own log (events, errors) • Logs are used to detect and troubleshoot problems in systems 2 20 August 2014

  3. Event Logs : The Problem MANY LOGS Millions of events Semi-Structured text data not meant for automated consumption • Huge volume of data is not amenable for human consumption 3 20 August 2014

  4. Roadmap failed  to  get  licenses  for  project  session  the  session  auth  has  failed.   failed  to  get  licenses  for  project  session  the  session  auth  has  failed.   failed  to  get  licenses  for  project  session  the  session  auth  has  failed.   failed  to  get  licenses  for  project  session  the  session  auth  has  failed.   Raw Logs failed  to  get  licenses  for  project  session  the  session  auth  has  failed.   Template Discovery Machine-Ready Data PARIS Human-readable Data 4 20 August 2014

  5. Logs in raw form

  6. Let’s Rearrange the Messages… Variable Word Variable Word Variable Word Variable Word 11 messages 9 distinct messages 4 templates

  7. Requirements for Template Discovery • Online • Produce immediate value • Consistent • Template assignment of a message should remain consistent over time • Efficient • Keep up with incoming message rates

  8. Template Discovery Algorithm: Incremental Text Clustering • Step 1: “Rough” clustering: • Creating/Assigning events to root clusters • Step 2: Cluster refinement: • Splitting root clusters Output: Forest of clusters

  9. Step 1: “Rough” Clustering Log Message Sequentially compare to Existing Cluster Similar to Existing Cluster? no yes Create new Cluster Assign to Existing Cluster Check for Split

  10. Step 2: Cluster Refinement Compute Entropy of Each Word Position MIN(entropy) < threshold for all entropy > ε Yes No Do nothing ε – No word occurs too frequently (e.g., Pkj = 0.99) threshold – At least one word occurs frequently enough (e.g., Pkj =0.10) Split Clusters based on Word position Entropy

  11. Template discovery algorithm Clustering example: m1: B C D F A B SimilarityThreshold: 0.8 m2: B C D F A B J =0.83 =0.91 =0.5 m3: A C D F E K m4: B C D F E B E 1000 appearances of m4 800 appearances of BCDFAB_ m3 m1,m2,m4

  12. Entropy Calculation for Split m4: B C D F E B 1000 * mx: B C D F A B * 800 * Entropy: 0.45 0 0 0 0.15 0 0

  13. Template discovery algorithm Clustering example: m1: B C D F A B SimilarityThreshold: 0.8 m2: B C D F A B J =0.83 =0.91 =0.5 m3: A C D F E K m4: B C D F E B E m5: B C D F E D m3 m1,m2,m4 m3 m4 m4 m1,m2

  14. Discovering System Process Patterns Same message In different process Optional messages • JDBC3 getGeneratedKeys(): disabled • (3) Connection release mode: auto • (5) getConnectionURLs=tcp://websiteURL:2507 • (6) Query translator:hql.ast.ASTQueryTranslatorFactory • (9) create connection. connectId • (10)mercury_db_loader_DB_Loader user=;pwd=; (2) SH remote was null. Exported object monitor. (4) Add task Main Flow (7) Register provider class dataentry.loader.LoaderMain (8) Service manager started (3) Connection release mode: auto (11) mercury_db_loader is up and running • PARIS (Principal Atoms Recognition In Sets) • Identifies sets of events that tend to occur together with no • Innate knowledge of messages • Provides enhanced view of system behavior dynamics

  15. Gets as input a large number of sets, that are assumed to have some mutual characterization. Detects principal sets of elements that tend to appear together in the data. Overcomes non-exact repetitions Ignores additional noise Uses: Analysis Compression Anomaly Detection PARIS Representation Elements Sets Atoms

  16. Representation error must be small, but not necessarily zero. Representation should serve some sense of compression of the data (sparsity). Minimal number of atoms (K). PARIS Representation Elements Sets Atoms

  17. PARIS Cost Function • Representation error must be small, but not necessarily zero. • Representation should serve some sense of compression of the data (sparsity). • Minimal number of atoms (K). Minimize the representation error of the data. Minimize the size of the representation (compression). Minimize the number of principal atoms.

  18. Results • Datasets

  19. Results • Template identification Representation Accuracy: 95%

  20. Results • Compression

  21. Visualizing the logs: Business App 2 Event Timeline Y axis: msg ID 70,000 distinct messages Appear in one graph view Behavioral patterns become Evident in this view X axis: Timeline One graph is worth a thousand logs!!!

  22. PARIS Result: Correct Process Identification Buss App 2 Logs • Atom ID: 27 • 734 User operation - stopnanny • 748message_broker STOPPED • 753 Input main(String[] args: • 754 Going to call WrapperManager.start(new Main(), args) • 755 Initializing Spring files • 757 Path for spring files is E:\HPBAC\conf\supervisor\spring • 759 Loading spring file • 764NannyConfigurationRepository initialization completed • 765 Will load JMX security info from E:\HPBAC\conf\jmxsecurity.txt • 766 No security file: E:\HPBAC\conf\jmxsecurity.txt - JMX will not be secured • 767 Registering beans for JMX exposure on startup • 768Autodetecting user-defined JMX MBeans • 769 Bean with name 'nannyManager' has been autodetected for JMX exposure • 770 HTTP adapter port is 11021 • 771 Succeeded adding html adapter • 772 manager thread loop started. • 773 Verifying time diff between cpp (local machine) and Java. • 774 Log file of time diff is: E:\HPBAC\tools\TimeDiff\time_diff.log • 776 Run java time diff • 780 Trying to initialize Properties Manager792Config server check passed • 793 Prerequisites have been met • 794 start() Nanny Manager • 795 Nanny Manager need to start all services?:true • 796 Going to start all services. • … Service Restart • Atom ID: 12 • 890 Failed creating SiS sample • 924 Failed processing http request: • report_ss_samples, from remoteHost: • Failed to acquire lock for publishing sample • 1183 Failed processing http request: • report_transaction, from remoteHost : • Failed to acquire lock for publishing sample 23 20 August 2014

  23. Summary • Summarize Event Logs: Template Creation • Lossless Reduction in size of data • Machine-readable • Process identification: PARIS • Strategic importance in managing IT environment • Human-readable

  24. Q&A

More Related