520 likes | 994 Views
Network Security Monitoring and Analysis based on Big Data Technologies. Bingdong Li. August 26, 2013. Outline. Motivation Objectives System Design Monitoring and Visualization Network Measurement Classification and Identification of Network Objects Conclusion Future Work. Motivation.
E N D
Network Security Monitoring and Analysis based on Big Data Technologies Bingdong Li August 26, 2013
Outline • Motivation • Objectives • System Design • Monitoring and Visualization • Network Measurement • Classification and Identification of Network Objects • Conclusion • Future Work
Motivation • Traditional security systems assume a static system • Network attacks • sophisticated • organized • targeted • persistent • dynamic • external • internal
Motivation • Problem: Network Security is becoming more challenging • Resource: A Large Amount of Security Data • Network flow • Firewall log • Application log • Server log • SNMP • Opportunity: Big Data Technologies, Machine Learning
Objectives • A network security monitor and analysis system based on Big Data technologies to • Measures the network • Real time continuous monitoring and interactive visualization • Intelligent network object classification and identification based on role behavior as context
Objectives Network Security Big Data Machine Learning
System Design • Data Collection
System Design • Online Real Time Process
System Design • NoSQL Storage
System Design • User Interfaces
System Design • The Design supports features: • Real Time Continuous Monitoring and Interactive Visualization • Network Measurement • Classification and Identification of Network Objects
Monitoring and Visualization • Real Time response within a time constraint • Interactive involve user interaction • Continuously “continue to be effective overtime in light of the inevitable changes that occur” (NIST)
Monitoring and Visualization • Retrieve Data • Web User Interfaces • Video Demo
Monitoring and Visualization • Data Retrieving: Data are stored with IP as primary key and time slice as the secondary key in column Accessing these data is in ϑ (1)
Demo of Interactivity and Continuity Video Demo
Network Measurement • A case study The Anonymity Technology Usage on Campus Network Using sFlow • Geo-Location • Usage of Anonymity Systems
Geo-location of Anonymity Usage on Campus One Instance: Bahamas, Belarus, Belgium, Bulgaria, Cambodia, Chile, Colombia, Estonia, Ghana, Greece, Hungary, Ireland, Israel, Jamaica, Jordan, Korea, Mongolia, Namibia, Nigeria, Pakistan, Panama, Philippines, Slovakia, Turkey, Ukraine, Vietnam, Zimbabwe Two Instances: Chad, ChezchRep, Denmark, Hongkong, Iran, Japan, Kazakhistan, Poland, Romania, Spain, Switzerland Three Instances: Austria, France, Singapore Four Instances: Australia, Indonesia, Taiwan, Thailand
Classification of Host Roles Data: Three months sFlow data from a large campus
Classification of Host Roles • Algorithms • Decision Tree • On-line SVM
Classification of Host Roles • Features • Ad hoc based on domain knowledge • Aggregating features for on-line classification • 24 features normalized between 0 and 1, inclusive
Classification of Host Roles • Features 24 features derived from • src/dest IP address • src/dest Port number • TTL • Package Size • Transport protocol
Classification of Host Roles • Ground Truth • Host Information in Active Directory • Crawler to validate its status
Classification of Host Roles • Classifying Client vs. Server • Classifying Web Server vs. Web Email Server • Classifying Hosts at Personal Office vs. Public Place • Classifying Hosts at Two Different Colleges • Feature Contributions
Accuracy • High accuracies of Host Role Classification
Identification of a User Data: NetFlow data from a large campus
Identification of a User • Algorithms • Decision Tree • On-line SVM • Ground Truth • Host Information in Active Directory • Crawler to validate its status
Identification of a User • Features Discrete probability distribution function (pdf) An Example: System Port Number [6, 8, 9, 11, 14, 30, 80, 1020] • Outliner (P) is 1%, • 80 is the interested port (S) • Number of bin 4 ( R )
Identification of a User • An Example (1-0.01) * 8 to 7, the 7th is 80, bin slice size = 80 / (4-1) = 26.6 [6, 8, 9, 11, 14, 30, 80, 1020] pdf = 0.625 0.125 0.125 0.125 6,8,9,11, 14 30 80 1020
Identification of a User • An Example without P and S Bin size slice is 1024/4 = 256, [6, 8, 9, 11, 14, 30, 80, 1020] pdf = 0.875 0 0 0.125 6,8,9,11, 14,30,80 1020
Accuracy • Identifying a particular user among other users Decision Tree 93.3% On-line Support Vector Machine 78.5%
Conclusion • Major Contributions • A Big Data analysis system • a conference paper • Monitoring and interactive visualization • Usage of anonymity technologies • a conference and a journal paper • Models of classification of host roles and identification and users • a conference paper
Conclusion • The Big Data analysis system is high performanceand scalable • Real Time Continuous Network Monitoring and Interactive Visualization are implemented and supported by the high performance system
Conclusion • Proxies and Tor are main anonymity technologies used on campus; • US, Germany, and China are the top 3 countries • Models and Features for Classification of Host roles: • client vs. server, non-web server vs. web server, personal office vs. public office, from two different colleges • Models of Features for Identification of a particular user among other users
Future Work • Improvement to the Current Work • More interactive features and better user interfaces • Further analysis on user identification: features, algorithm (such as deep learning)
Future Work • Extension to the Current Work • Define and filter out background traffic • Detection of operating system fingerprinting • Identity anonymity • Fusion with other network security data source