180 likes | 248 Views
Final Project. John Rodgers. Pre-Processing. Very large unstandardized data files. Transformed into a common format: . csv flows.txt & Connections1.csv Got rid of everything except following attributes:
E N D
Final Project John Rodgers
Pre-Processing • Very large unstandardized data files. • Transformed into a common format: .csv • flows.txt & Connections1.csv • Got rid of everything except following attributes: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets (and malicious for Connections1.csv). • tcpdump file • Unable to use meaningfully.
Connections • Able to use this file fully • No changes besides formatting. • Selected attributes: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets, malicious
Connections • Rules via apriori: • Most significant source of malicious packets came from IP 10.20.30.40 • Packets of 120 bytes were also highly supported by the data.
Connections • Decision Tree created. • Identifies ports to be most divisive feature • Focus of future analysis
Connections • SOM of the data: • Srcport is highlighted. • High density of malicious packets had high port numbers, but no other easily gleaned features.
Connections • Accuracy of SOM’s Classifier
Connections • Accuracy of KNN • 3 NN • testing for being malicious
Flows • Trimmed off some attributes. • Split source into src and srcport. • Follow attributes used: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets • Randomly selected 50k instances for testing.
Flows • SOM of the data: • Shaded based off srcport • Based on SOM of supervised data, left cluster may be hazardous.
Flows • K-Means graph: • Clusters 1/2/3 nicely separated. • Cluster 4 is not easily noticed (if at all present). • Cluster 5 is anomalous.
TCP Dump • Able to open file in Wireshark successfully.
TCP Dump • Every action thereafter crashed the program.
TCP Dump • Able to get file statistics:
TCP Dump • Used TShark, Wireshark’s command line counterpart. • command “tshark.exe –r <file> -V” opened file successfully • unable to parse into a usable CSV or common delimited file-type
TCP Dump • Output of TShark: • readable • Mostly unusable for data processing purposes. • Acquired source and destination port/address and protocol.
TCP Dump • SOM of TCPdump: • srcport highlighted • Bottom right area is likely to be malicious
TCP Dump • K-Means graph: • Did not like this data set... • Absurdly high number of dstports = 21 (97.5%) • Could have been a result of wonky parsing and not set itself.