1 / 18

Final Project

Final Project. John Rodgers. Pre-Processing. Very large unstandardized data files. Transformed into a common format: . csv flows.txt & Connections1.csv Got rid of everything except following attributes:

zorina
Download Presentation

Final Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Project John Rodgers

  2. Pre-Processing • Very large unstandardized data files. • Transformed into a common format: .csv • flows.txt & Connections1.csv • Got rid of everything except following attributes: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets (and malicious for Connections1.csv). • tcpdump file • Unable to use meaningfully.

  3. Connections • Able to use this file fully • No changes besides formatting. • Selected attributes: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets, malicious

  4. Connections • Rules via apriori: • Most significant source of malicious packets came from IP 10.20.30.40 • Packets of 120 bytes were also highly supported by the data.

  5. Connections • Decision Tree created. • Identifies ports to be most divisive feature • Focus of future analysis

  6. Connections • SOM of the data: • Srcport is highlighted. • High density of malicious packets had high port numbers, but no other easily gleaned features.

  7. Connections • Accuracy of SOM’s Classifier

  8. Connections • Accuracy of KNN • 3 NN • testing for being malicious

  9. Flows • Trimmed off some attributes. • Split source into src and srcport. • Follow attributes used: • protocol, flags, tos, src, srcport, dst, dstport, bytes, packets • Randomly selected 50k instances for testing.

  10. Flows • SOM of the data: • Shaded based off srcport • Based on SOM of supervised data, left cluster may be hazardous.

  11. Flows • K-Means graph: • Clusters 1/2/3 nicely separated. • Cluster 4 is not easily noticed (if at all present). • Cluster 5 is anomalous.

  12. TCP Dump • Able to open file in Wireshark successfully.

  13. TCP Dump • Every action thereafter crashed the program.

  14. TCP Dump • Able to get file statistics:

  15. TCP Dump • Used TShark, Wireshark’s command line counterpart. • command “tshark.exe –r <file> -V” opened file successfully • unable to parse into a usable CSV or common delimited file-type

  16. TCP Dump • Output of TShark: • readable • Mostly unusable for data processing purposes. • Acquired source and destination port/address and protocol.

  17. TCP Dump • SOM of TCPdump: • srcport highlighted • Bottom right area is likely to be malicious

  18. TCP Dump • K-Means graph: • Did not like this data set... • Absurdly high number of dstports = 21 (97.5%) • Could have been a result of wonky parsing and not set itself.

More Related