Network Mapping and Anomaly Detection Athina Markopoulou ( Irvine)

Network Mapping and Anomaly Detection Athina Markopoulou (Irvine) Robert Calderbank (Princeton) Rob Nowak (Madison) MURI Kickoff Meeting September 19, 2009

Outline Challenges - Applications - Mathematics Preliminary Results - Detecting Malicious Traffic Sources (Athina Markopoulou) - Network Topology Id - Network-wide Anomaly Detection Research Directions

Application Challenges Network Mapping: Infer network topology/connectivity from minimal measurements Detecting Topology Changes: Quickly sense changes in routing or connectivity Network-wide Anomalies: Detect weak and distributed patters of anomalous network activity Predicting Malicious Traffic: Identify network sources that are likely to launch future attacks

Mathematical Challenges Vastly Incomplete Data: Impossible to monitor a network everywhere and all the time. Where and when should we measure? Large-scale Inference: Inference of high-dimensional signals/graphs from noisy and incomplete data. Robust statistical data analysis and scalable algorithms are crucial. Network Representations: Statistical analysis matched to network structures. Can network data be ‘sparsified’ using new representations and transformations? Network Prediction Models: New ‘network-centric’ statistical methods are needed to cluster network nodes for robust prediction from limited datasets.

Predicting Malicious Traffic Sources

Predictive Blacklistingas an Implicit Recommendation System • Problem: predict sources of malicious traffic on the Internet • Blacklists: • list of worst offenders (source IP addresses or prefixes) • used to block (or to further scrub) traffic originating from those sources • Goal: • Predict malicious sources that are likely to attack a victim in the future based on past logs • Prediction vs. Detection • strictly speaking, this is not “detection” • but it does require finding patterns in the data

Traditional Blacklist Generation • Local Worst Offender List (LWOL) • Most prolific local offenders • Reactive but not proactive • Global Worst Offender List (GWOL) • Most prolific global offenders • Might contain irrelevant offenders • Non prolific attackers are elusive to GWOL • State of the art: Collaborative Blacklisting • J. Zhang, P. Porras, and J. Ullrich, “Highly Predictive Blacklisting”, USENIX Security 2008 (best paper award) • Key idea: A victim is likely to be attacked not only by sources that attacked this particular, but also by sources that attacked “similar victims” • Methodology: Use link-analysis (pagerank) on the victims similarity graph to predict future attacks • First methodological development in this problem a long time!

Formulating Predictive Blacklistingas an Implicit Recommendation System Recommendation system (e.g. Netflix, Amazon) Predictive Blacklisting Attackers Users Items ( movies) Victims R = Rating Matrix Time R(t) = Attack Matrix

Collaborative Filtering (CF) different techniques capture different patterns in the data • Multi-level Prediction • Individual level: (attacker, victim) • use time series to project past trends • Local level: neighborhood-based CF • group “similar” victim networks,(knn) • notion of similarity accounts for common attackers and time • groups of attackers attacking the same victims • find them using the cross-association (CA) method • [Global level: factorization-based CF (in progress)] • find latent factors in the data using, e.g. SVD • Combine ratings from different predictors

Tested our approach on Dshield data6-month of logs Princeton Dshield.org UCI • Dshield.org is a central repository of shared logs • Several victim organizations submit their IDS logs (flow data) • The repository analyzes the logs and provides a predictive blacklist, tailored to each victim

Several different patterns co-exist in the data and should be detected and used for prediction

Preliminary results • A combination of methods significantly improves the hit count of the blacklist • up to 70% (57% on avg) compared to the state-of-the art (HPB) Combined method State-of-the-art (HPB) Older method (GWOL) • and there is much room for improvement

Challenges & Future Directions • Get closer to the upper bound • Latent factor techniques • Dealing with missing data • Adversarial model • Scalability • Hopefully interactions with other people in this group • F. Soldo, A. Le, A. Markopoulou, http://arxiv.org/abs/0908.2007

Network Mapping

Network Mapping Existing Methods: Active probing (e.g., traceroute) Lumeta Corporation New Approach: Passive monitoring

10 11 5 13 The Data Hop-counts from end-hosts to monitors; extracted from TTL fields of traffic at monitors monitors 1 ? end-hosts 8 ?

Clustering End-Hosts Problem: Use hop-count data to automatically cluster end-hosts into topologically relevant groups (e.g., subnets) Intuition: End-hosts with similar hop-counts are probably close together Challenge: Clustering with missing data honeypots end-hosts 2-d histogram of hop-counts; ellipses indicate end-hosts from different subnets

Matrix Completion r observed hop-counts are random samples of entries of complete hop-count matrix SVD of hop-count matrix is low-rank

Results mixture model clusters from complete data 0 fraction of complete data 1 clusters from 25% data

Network-wide Anomaly Detection

Distributed Network Anomaly Detection Observation model: Binary pattern 0/1 Signal strength unknown Weak in strength: signal Invisible in per node signature Weak in extent: # affected nodes Invisible in network wide aggregate

Detecting weak and sparse patterns Prior work:Can detect weak and unstructured patterns by exploiting multiplicity. (Ingster, Jin-Donoho’03, Abramovich et al ’01) Subtle adaptive testing procedures: Higher criticism, False discovery control # active nodes Localizable signal strength sparsity Now you see it, now you don’t

Network anomaly patterns Interactions Nodes In addition to multiplicity, can we exploit the (possibly non-local) dependencies between node measurements to boost performance? Method must be adaptive to network interaction structure. How do node interactions effect thresholds of detectability/localizability?

Modeling network anomaly patterns Latent multi-scale dependencies Observed network node measurements Latent multi-scale Ising model : # edge agreements strengthof interaction

Hierarchical clustering and basis learning Hierarchical correlation clustering Unbalanced Haar Basis • Theorem: Consider a latent multi-scale Ising model with uniform node interaction strength . With probability , • the correct dependency structure (tree) can be learnt using i.i.d network observations x by hierarchical correlation clustering. • the number of non-zero basis coefficients for an x drawn at random is is .

Detection of anomalies in transform domain Weak patterns are amplified by the sparsifying transform adapted to network topology, while noise characteristics remain the same. Detection vs. Signal Strength Detection using Wavelet coefficients Detection using Original data Signal is focused and strength is amplified Coherent activation patterns result in few non-zero basis coefficients and can be detected with much smaller signal strength Wavelet coefficients Network data

Internet anomaly detection unknown network Monitor Sample delay covariance matrix Example basis vectors learnt from O(log p) network measurements using hierarchical clustering Compression achieved for real Internet RTT data

Research Directions Active Sensing: Sequential algorithms that automatically decide where, what and when to measure? Online Large-scale Inference: on-line and near real-time network monitoring to detect topology changes and traffic anomalies. Wireless Network Sensing: Exploitation of sparsity and diversity in wireless networks for fast and robust identification of network-wide characteristics. New Network Representations: Relationships between wavelet representations and persistent homology.

Extra slides

10 6 5 13 11 Network Discovery network structure is unknown; infer network routing/topology by ‘triangulation’

10 6 5 13 11 Network Discovery ? Unfortunately, many hop-counts are not observed ?

Network Mapping and Anomaly Detection Athina Markopoulou ( Irvine)