170 likes | 569 Views
Netflow Data-Mining Techniques. Chris Poetzel Argonne National Laboratory cpoetzel@anl.gov Scott Pinkerton. Netflow Data Mining. Argonne Background Information Sliding Window Analysis Using Contextual Knowledge to adjust data-mining Incident Investigation
E N D
Netflow Data-Mining Techniques Chris Poetzel Argonne National Laboratory cpoetzel@anl.gov Scott Pinkerton
Netflow Data Mining • Argonne Background Information • Sliding Window Analysis • Using Contextual Knowledge to adjust data-mining • Incident Investigation • Integration, Integration, Integration • Future • Conclusions ESCC Meeting
ANL Background • Utilize OSU’s Flow-Tools written by Mark Fullmer • Collecting from 14 different Router/Switches at ANL-East • ~600GB currently stored and growing • 1 Year retention period desired – backing off as we add devices • Current collection/Analysis Station: IBM 360, RedHat Linux, 8GB Ram, 4 1.6 Mhz CPU ESCC Meeting
Sliding Window Analysis • The raw volume of Netflow Data can make data-mining long and cumbersome • Implemented a 5 minute Sliding Window for analysis • Every minute, check previous 5 minutes of data (via cron jobs) • Reduces processing time (~20 secs) • Catches vast majority of scans/probes in near real-time ESCC Meeting
Contextual Knowledge • Which way is the data flowing? • Contextual knowledge will affect what we search for & what we do with the results Destination Source ESCC Meeting
OUT -> IN • OUT -> IN • Receive many class B/C scans a day • Only Watch for scans on open FW ports • Dynamically read FW config every ½ hour to determine open ports in FW • Use Netflow Data to look for scans on open FW ports • Fast Scans: Script executed every minute looking at past 5 minutes of data to catch Fast Scanners • Slow Scans: Script run every hour looking at previous 24 hours of data to catch Slow Scanners • Once scanner detected, send IP for FW shun ESCC Meeting
IN -> OUT • Looking for problem machines at the Lab – 1st approximation is to look at machines which have contacted large # of Internet hosts in a short period of time • Can indicate a compromised/infected machine • Exclude a number of internal machines based on apriori knowledge • email servers, domain controllers, network scanning machines (ignore) ESCC Meeting
IN -> IN • Requires collection on multiple internal switches/routers • Detect Internal Scanning • Cron job runs every hour • Infected host scanning local subnet/supernet • Detect unauthorized internal network scans • Post-Mortem Forensic Value • What did an internally compromised machine do once it was compromised • Track down cross-contamination ESCC Meeting
OUT -> OUT • May not apply to every site • Co-location personal or transport traffic constitute OUT -> OUT traffic on a network • Scans in the OUT <-> OUT direction are detected and the appropriate network admin/security personal are notified ESCC Meeting
Incident Investigation 1/2 • What to do when an incident happens? (Besides pull your hair out) • Netflow Data is invaluable in cyber security investigations. • Start by classifying IP addresses into a taxonomy • Possible Bad Guy • Possible Victims • Possible Intermediary (stepping stone, rootkit resource site, etc) • This process can be aided by host syslog, etc. ESCC Meeting
Incident Investigation 2/2 • By identifying the possible victims, the process of containment and clean-up becomes much easier • Netflow has become an invaluable tool for our cyber security team ESCC Meeting
Integration³ • To improve Signal-to-Noise ratio of cyber security events, correlating netflow data with other data sources has been very helpful • IDS logs • ARP/CAM Tables – MAC “persistence” • Firewall Logs • DHCP/VPN Logs • Host based Syslog ESCC Meeting
IDS & Netflow Logs • Used to cross validate either an IDS or a Netflow alarm with each other • IDS alarms usually give specific points of attack • Netflow can be used to provide background or framework of attack • Netflow + IDS can provide a better perspective of cyber security events • Store IDS and Netflow Logs in same directory structure to make searching easier ESCC Meeting
VPN/DailUP Scan/Virus Detection • Marriage of Many Data Sources • Each Dailup/VPN login initiates a virus scan of connected host • Dailup/VPN connected host is monitored via netflow for outbound scanning activity • If remotely connected host is determined to be virally infected or doing malicious behavior, connection is terminated and user account is locked • All actions are performed via automated scripts, no human intervention ESCC Meeting
Future • Host Profiling Via Netflow • Determine what “normal” behavior for a host is and then alert when it varies from the norm • Some IDS products are attempting this approach (Network Flight Recorder, Lancope) • Visualization of Netflow Data • Charts, Graphs, Animations of Network Conversations • Work Being done by NCSA • Better Integration with other data sources ESCC Meeting
Conclusions • Collecting Netflow data to support Cyber Security activities is tremendously helpful. • It is an invaluable data source for performing post-mortem forensic analysis, as well as an extremely helpful tool for performing real-time detection, notification, and activeresponse – blocking an IP address. ESCC Meeting
Thanks • Chris Poetzel • cpoetzel@anl.gov • 630-252-7431 • Scott Pinkerton • pinkerton@anl.gov • 630-252-9770 ESCC Meeting