Final Project

Final Project John Rodgers

Pre-Processing • Very large unstandardized data files. • Transformed into a common format: .csv • flows.txt & Connections1.csv • Got rid of everything except following attributes: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets (and malicious for Connections1.csv). • tcpdump file • Unable to use meaningfully.

Connections • Able to use this file fully • No changes besides formatting. • Selected attributes: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets, malicious

Connections • Rules via apriori: • Most significant source of malicious packets came from IP 10.20.30.40 • Packets of 120 bytes were also highly supported by the data.

Connections • Decision Tree created. • Identifies ports to be most divisive feature • Focus of future analysis

Connections • SOM of the data: • Srcport is highlighted. • High density of malicious packets had high port numbers, but no other easily gleaned features.

Connections • Accuracy of SOM’s Classifier

Connections • Accuracy of KNN • 3 NN • testing for being malicious

Flows • Trimmed off some attributes. • Split source into src and srcport. • Follow attributes used: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets • Randomly selected 50k instances for testing.

Flows • SOM of the data: • Shaded based off srcport • Based on SOM of supervised data, left cluster may be hazardous.

Flows • K-Means graph: • Clusters 1/2/3 nicely separated. • Cluster 4 is not easily noticed (if at all present). • Cluster 5 is anomalous.

TCP Dump • Able to open file in Wireshark successfully.

TCP Dump • Every action thereafter crashed the program.

TCP Dump • Able to get file statistics:

TCP Dump • Used TShark, Wireshark’s command line counterpart. • command “tshark.exe –r <file> -V” opened file successfully • unable to parse into a usable CSV or common delimited file-type

TCP Dump • Output of TShark: • readable • Mostly unusable for data processing purposes. • Acquired source and destination port/address and protocol.

TCP Dump • SOM of TCPdump: • srcport highlighted • Bottom right area is likely to be malicious

TCP Dump • K-Means graph: • Did not like this data set... • Absurdly high number of dstports = 21 (97.5%) • Could have been a result of wonky parsing and not set itself.

Final Project

Final Project

Presentation Transcript

Final Project

Final Project

FINAL PROJECT

Final Project

Final Project

Final Project

Final project

Final Project

Final Project

Final Project

Final Project

Final Project

FINAL PROJECT

FINAL PROJECT

Final Project

Final project

Final Project

Final Project

Final Project

Final Project

Final Project