Machine Learning for Network Anomaly Detection

Machine Learning for Network Anomaly Detection Matt Mahoney

Network Anomaly Detection • Network – Monitors traffic to protect connected hosts • Anomaly – Models normal behavior to detect novel attacks (some false alarms) • Detection – Was there an attack?

Host Based Methods • Virus Scanners • File System Integrity Checkers (Tripwire, DERBI) • Audit Logs • System Call Monitoring – Self/Nonself (Forrest)

Network Based Methods • Firewalls • Signature Detection (SNORT, Bro) • Anomaly Detection (eBayes, NIDES, ADAM, SPADE)

User Modeling • Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap) • Destination address – IP scans • Destination port – port scans

Frequency Based Models • Used by SPADE, ADAM, NIDES, eBayes, etc. • Anomaly score = 1/P(event) • Event probabilities estimated by counting

Attacks on Public Services PHF – exploits a CGI script bug on older Apache web servers GET /cgi-bin/phf?Qalias=x%0a/usr /bin/ypcat%20passwd

Buffer Overflows • 1988 Morris Worm – fingerd • 2003 SQL Sapphire Worm char buf[100]; gets(buf); buf Exploit code stack 0 100 Return Address

TCP/IP Denial of Service Attacks • Teardrop – overlapping IP fragments • Ping of Death – IP fragments reassemble to > 64K • Dosnuke – urgent data in NetBIOS packet • Land – identical source and destination addresses

Protocol Modeling • Attacks exploit bugs • Bugs are most common in the least tested code • Most testing occurs after delivery • Therefore unusual data is more likely to be hostile

Protocol Models • PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP) • ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)

Time Based Models • Training and test phases • Values never seen in training are suspicious • Score = t/p = tn/r where • t = time since last anomaly • n = number of training examples • r = number of allowed values • p = r/n = fraction of values that are novel

Example tn/r • Training: 0000111000 n/r = 10/2 • Testing: 01223 • 0: no score • 1: no score • 2: tn/r = 6 x 10/2 = 30 • 2: tn/r = 1 x 10/2 = 5 • 3: tn/r = 1 x 10/2 = 5

PHAD – Fixed Rules • 34 packet header fields • Ethernet (address, protocol) • IP (TOS, TTL, fragmentation, addresses) • TCP (options, flags, port numbers) • UDP (port numbers, checksum) • ICMP (type, code, checksum) • Global model

LERAD – Learns conditional Rules • Models inbound client TCP (addresses, ports, flags, 8 words in payload) • Learns conditional rules If port = 80 then word1 = GET, POST (n/r = 10000/2)

LERAD Rule Learning • If word1 = GET then port = 80 (n/r = 2/1) • word1 = GET, HELO (n/r = 3/2) • If address = Marx then port = 80, 25 (n/r = 2/2)

LERAD Rule Learning • Randomly pick rules based on matching attributes • Select nonoverlapping rules with high n/r on a sample • Train on full training set (new n/r) • Discard rules that discover novel values in last 10% of training (known false alarms)

DARPA/Lincoln Labs Evaluation • 1 week of attack-free training data • 2 weeks with 201 attacks Internet Router Sniffer Attacks SunOS Solaris Linux NT

Attacks out of 201 Detected at 10 False Alarms per Day

Problems with Synthetic Traffic • Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting • Too few sources: Client addresses, HTTP user agents, ssh versions • Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands

Real Traffic is Less Predictable Real r (Number of values) Synthetic Time

Mixed Traffic: Fewer Detections, but More are Legitimate

Project Status • Philip K. Chan – Project Leader • Gaurav Tandon – Applying LERAD to system call arguments • Rachna Vargiya – Application payload tokenization • Mohammad Arshad – Network traffic outlier analysis by clustering

Further Reading • Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD. • Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC. • http://cs.fit.edu/~mmahoney/dist/

Machine Learning for Network Anomaly Detection

Machine Learning for Network Anomaly Detection

Presentation Transcript

Anomaly Detection

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection

Privacy-preserving collaborative network anomaly detection

ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE

Single Pass Anomaly Detection In Network

Cluster Analysis for Anomaly Detection

Anomaly Detection

Anomaly Detection Systems

Anomaly Detection in Gamma Ray Spectra: A Machine Learning Perspective

Traffic Anomaly Detection

Learning Rules and Clusters for Network Anomaly Detection

Locality, Network Control and Anomaly Detection

Design of Bloom Filter Array for Network Anomaly Detection

Anomaly Detection Systems

Volume Anomaly Detection

Bayesian Network Anomaly Pattern Detection for Disease Outbreaks

Learning Rules for Anomaly Detection of Hostile Network Traffic

Causal Modeling for Anomaly Detection

Causal Modeling for Anomaly Detection

Visualizing Audio for Anomaly Detection

Anomaly Detection Industry