1 / 24

Tavolo 2 - Big Data Adaptive Monitoring

Tavolo 2 - Big Data Adaptive Monitoring. CIS, UNINA, UNICAL, UNIFI. Primi risultati.

oleg
Download Presentation

Tavolo 2 - Big Data Adaptive Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tavolo 2 - Big DataAdaptive Monitoring CIS, UNINA, UNICAL, UNIFI

  2. Primi risultati • I primi risultati del tavolo sono stati pubblicati nell’articoloBig Data for Security Monitoring: Challenges Opportunities in Critical Infrastructures ProtectionL. Aniello2, A. Bondavalli3, A. Ceccarelli3, C. Ciccotelli2, M. Cinque1, F. Frattini1, A. Guzzo4, A. Pecchia1, A. Pugliese4, L. Querzoni2, S. Russo1(1) UNINA (2) CIS (3) UNIFI (4) UNICAL • Presentato al workshop BIG4CIP @EDCC (12 maggio 2014)

  3. Data-Driven Security Framework DATA ANALYSIS ATTACK MODELING INVARIANT- BASED MINING CONFORMANCE CHEKING FUZZY LOGIC BAYESIAN INFERENCE … DATA PROCESSING MONITORING ADAPTER RAW DATA COLLECTION KNOWLEDGE BASE APPLICATION/ SYSTEM LOGS ENVIRONMENTAL DATA … PROTECTION ACTIONS NODE RESOURCE DATA IDS ALERTS NETWORK AUDIT CRITICAL INFRASTRUCTURE

  4. Adaptivemonitoring

  5. Scenario • Problem:need to analyze more data coming from distinct sources in order to improve the capability to detect faults/cyber attacks • Excessively large volumes of information to transfer and analyze • Negative impact on the performance of monitored systems • Proposed solution:dynamically adapt the granularity of monitoring • Normal case: coarse-grained monitoring (low-overhead) • Upon anomaly detection: fine-grained monitoring (higher overhead) • Two distinct scenarios • Fault detection  current CIS’s research direction • Cyber attack detection

  6. Anomaly Detection • Metrics Selection • Find correlated metrics (invariants) to be used as anomaly signals • Learn which invariants hold when the system is healthy • Profile the healthy behavior of the monitored system • Anomaly Detection • Monitor the health of the system by looking at a few metrics • How to choose these metrics? • When an invariant stops to hold, adapt the monitoring • The aim is detecting the root cause of the problem • Possibility of false positives [1] J., M., R., W., "Information-Theoretic Modeling for Tracking the Health of Complex Software Systems", 2008 [2] J., M., R., W., " Detection and Diagnosis of Recurrent Faults in Software Systems by Invariant Analysis", 2008 [3] M., J., R., W., "Filtering System Metrics for Minimal Correlation-Based Self-Monitoring", 2009

  7. Adapt the Monitoring • Two dimensions in adapting the monitoring • Change the set monitored metrics • Change the frequency of metrics retrieval • How to choose the way of adapting the monitoring on the basis of the detected anomaly? • Additional issue • The goal of the adaptation is discovering the root cause of the problem • Need to zoom-in specific portions of the system • Very likely to increase the amount of data to transfer/analyze • Risk to have a negative impact on system performance • Possible solution: keep the volume of monitored data limited by zooming-out other portions of the system [4] M., R., J., A., W., "Adaptive Monitoring with Dynamic Differential Tracing-Based Diagnosis", 2008 [5] M., W., "Leveraging Many Simple Statistical Models to Adaptively Monitor Software Systems", 2014

  8. Fault Localization • Goal: given a set of alerts, determine which fault occurred and which component originated it • Problems • A same alert may be due to different faults (Ambiguity) • A single fault may cause several alerts (Domino Effect) • Concurrent alerts may be generated by concurrent unrelated faults • Tradeoff: monitoring granularity vs precision of fault identification • Approaches: • Probabilistic models (e.g. HMM, Bayesian Networks) • Machine learning techniques (e.g. Neural Networks, Decision Trees) • Model-based techniques (e.g., Dependency Graphs, Causality Graphs) [6] S., S., "A survey of fault localization techniques in computer networks", 2004 [7] D., G., B., C., "Hidden Markov Models as a Support for Diagnosis: ...", 2006

  9. Prototype - Work in Progress monitoring of a JBoss cluster by using Ganglia Host #1 Mon. Host Host #3 Host #4 Host #2 gmetad JBoss AS JBoss AS JBoss AS JBoss AS Adaptive Monitoring gmond gmond gmond gmond monitored metrics monitored metrics monitored metrics monitoring adaptations monitoring adaptations

  10. Prototype - Goals • Identify a small set of metrics to monitor on a JBoss cluster to detect possible faults • Find existing correlations • Profile healthy behavior • Inject faults on JBoss with Byteman (http://byteman.jboss.org/) • For each fault, identify the set of additional metrics to monitor • Implement the prototype in order to evaluate • The effectiveness of the approach • The reactivity of the adaptation • The overhead of the adaptation

  11. Operating systems and Application servers monitoring

  12. Data collection and processing • Collects a selection of attributes from OS and AS, through probes that have been installed on machines • Current implementation observes Tomcat 7 ad CentOS 6 • Executes the Statistical Prediction and Safety Margin algorithm on the data collected • The CEP Esper is used to apply rules on events (performs the detection of anomalies) • Work partially done within the context of the Secure! Project (see later today)

  13. High level view

  14. InvariantsMining

  15. Why invariants? • Invariants are properties of a program that are guaranteed to hold for all executions of the program. • If those properties are brokenat runtime, it is possible to raise an alarm for immediate action • Invariants can be useful to • detect transient faults, silent errors and failures • report performance issues • avoid SLAs violations • help operators to understand the runtime behavior of the app • Pretty natural properties for apps performing batch work

  16. An example of flow intensity invariant • A platform for the batch processing of files: the processing time is proportional to the file size • Measuring the file size and the time spent in a stage, I(x) and I(y), (the flow intensities), the equation is an invariant relationship characterising the expected behaviour of the batch system. • If there is an execution problem (e.g., file processing hangs) the equation does not hold any more (broken invariant)

  17. Research questions RQ1: how to discover invariants out of the hundreds of properties observable from an application log? RQ2: How to detect broken invariants at runtime?

  18. Our contribution • AUTOMATED MINING • A framework and a tool for mining invariants automatically from application logs • tested on 9 months of logs collected from a real-world Infosys CPG SaaS application • able to to automatically select 12 invariants out of 528 possible relationships • IMPROVED DETECTION • An adaptive threshold scheme defined to significantly shrink down the number of broken invariants • from thousands to tens broken invariants w.r.t. static thresholds on our dataset

  19. BayesianInference

  20. Data-driven Bayesian Analysis • Security monitors may produce a large number of false alerts • A Bayesian network can be used to correlate alerts coming from different sources and to filter out false notifications • This approach has been successfully used to detect credential stealing attacks • Raw alerts generated during the progression of an attack (e.g. user-profile violations and IDS notifications) are correlated • The approach was able to remove around 80% of false positives (i.e., not compromised user being declared compromised) without missing any compromised user

  21. Data-driven Bayesian Analysis • Vector extraction starting from raw data: • each vector represents a security event, e.g., attack, compromised user, etc… • suitable for post-mortem forensics and runtime analysis; • event logs, network audit, environmental sensors. event binary features (0 / 1) VECTOR EXTRACTION

  22. Bayesian network • Allows estimating the probability of the hypothesis variable (attack event), given the evidence in the raw data: hypothesisvariable (the useriscompromised) C unknown address information variables (alerts) A14 A1 … suspicious download A2 multiple logins • Network parameters • a-priori probability P(C); • conditional probability table (CPT) for each alert Ai.

  23. Incident analysis • Estimate the probability that the vector represents an attack, given the features vector … ✓ ✓ P(C)=0.31

  24. Preliminary testbed • A preliminary implementation with Apache Storm • Tested with synthetic logs emulating the activity of 2,5 million users, generating 5 millions log entries per day (IDS logs and user access logs) ___ ___ ___ LogStreamer (spout) FactorCompute (bolt) AlertProcessor (bolt)

More Related