520 likes | 615 Views
Using Data Mining to Develop Profiles to Anticipate Attacks Systems and Software Technology Conference (SSTC 2008). May 1, 2008 Dr. Michael L. Martin Uma Marques MITRE. MITRE Standard Disclaimer .
E N D
Using Data Mining to Develop Profiles to Anticipate AttacksSystems and Software Technology Conference (SSTC 2008) May 1, 2008 Dr. Michael L. Martin Uma Marques MITRE
MITRE Standard Disclaimer • The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.
Where is the threat • Most of computer security money is spent in prevention -- a bastion mentality • Most of the loss is from insider activity (82%) • Intrusion Detection is the art of detecting and responding to computer misuse
Intrusion Detection (ID) • Deterrence (we will find out what you did and catch you) • Detection • Misuse detection based on known patterns of attack (signatures) • Anomaly detection (profile of expected behavior) • patterns of acceptable behavior • patterns of known misbehavior
Intrusion Detection (continued) • Response • Damage Assessment • need to assess in dollar terms • Attack Anticipation • when (time of year/day; significant dates) • type • Prosecution Support (forensics)
Host Based Patterns of File Access Patterns of Application Execution Network Based Analysis of Packets and other network activity Comparison of Network and Host Based Intrusion Detection
Sensor Placement & Firewalls • Sensor’s placed outside the Firewall (sometime called the DMZ or demilitarized zone) are useful for detecting the source addresses attempting to attack and for attack anticipation • Sensor’s placed inside the Firewall are useful for detecting attacks that get through the firewall and for unauthorized traffic going out
A B A B A: sender B: receiver A: sender B: receiver C C Masquerade: C masquerades as B so A thinks B received the message. Interception: C obtains copy of data intended for B only A B A B A: sender B: receiver A: sender B: receiver C C Spoofing: C sends message to B which B assumes is from A Modification: C intercepts and changes data intended for B Threats to Normal Network Traffic A B A: sender B: receiver Normal Transmission A B A: sender B: receiver C Monitoring: C learns about A and B by analyzing traffic
Attacker Chain of Attack Attacker Dial-In Source Proxy Attack Target No attack code running on original workstation Dial-up to stolen ISP account. Might use hacked phone switch to confuse trail. Originate attack from stolen user account. No root access necessary. User directory may contain executables and especially data files related to system attacks. One or more intermediary hosts used as cutouts to confuse trail. Can telnet in and telnet back out of stolen account without root access. May use netcat or other introduced executable to set up a convenient proxy. • Automated attack software normally requires root privileges to access network. Manual attack may not need root access. • Hostile code can include: • Sniffer • IRC bot • Game bot • Denial-of-service zombie • Attack goal can include: • Gaining logon access • Manipulating a chat room • Special powers on game server • Disabling host
Proactive Intrusion Detection • Security violations evolve in multiple states • Preliminary stages often not destructive • merely preparatory steps in the Attach Scenario • Goal is to Detect attack precursors, and take immediate action • Preventing the resulting attach (Temporal Data Mining)
Phases of Attack • Target Identification • Potential victim(s) • Experienced Crackers keep Long Lists of Potential Victims; Sometimes willing to Share
Phases of Attack • Intelligence Gathering • Probe Systems to garner: Operating System and Version, List of Network Services Provided • Use Password Sniffing & Guessing; and well know compromises for Buggy Network Services • Most Vulnerability Scans are Heavy-Handed (immediately visible to virtually any network intrusion detection system IDS) • Patient & Skillful Attackers can circumvent IDS
Phases of Attack • Initial Compromise • Often Very Messy • Easy to find Evidence at this time • Unusual Number of Failed Logons • Log Records of Buffer Overflows and Undocumented System Features • Core Dumps • Daemons Restarting
Phases of Attack • Privilege Escalation • Exploit Code to Compromise System • Exploit well-know vulnerabilities to Gain Root • Last Flurry of Incriminating Error Messages
Phases of Attack • Reconnaissance • First; Logging On? Second; Where are Logs Stored? • Logs in Permanente form Best (printer, CD-R) Forwarding Address of Root’s Email • Check Administrators’ home directory to see what they have been up to
Phases of Attack • Reconnaissance • Looks for Security Programs • Looks for Open Files– what are currently running programs doing? • Looks for File Integrity Programs (Tripwire, etc) • Systems Administrators (Name, System, etc)
Phases of Attack • Covering Tracks • Deleting Log! (Red Flag—”I’ve been attacked”) • Editing Log! (Remove their tracks only) • Log Editors for Binary Data (necessary for editing binary log data; equivalent to burglar tools) ---- utmp and wtmp are binary log files
Phases of Attack • Covering Tracks • Back Door • hidden copy of the command shell that is SUID (set user ID) root (the file is owner by root, the SUID bit is set, and the intruder has execute permission) • Hacked Binaries (modification or replacement of standard system executables) also called altered binaries, Trojan horses, hostile changelings, and trojanizing
Conceptual Views of Misuse • An Unauthorized Individual Accesses Data • An Unauthorized Individual Modifies Data • Denial of Service
Acceptable versus Unacceptable • If you had a Perfect Model of Acceptable Behavior OR a Perfect Model of Unacceptable Behavior it would be Easy • That is if you have defined all Acceptable Behavior anything else is Unacceptable or Misuse • Or is you have defined all Unacceptable Behavior anything else is Acceptable
Acceptable Behavior Models • Usually based on historic data on ‘acceptable behavior’ • System is ‘trained’ on historical data • But if training data has unacceptable behavior in it (that was missed) then unacceptable behavior is allowed (false negative) • But if training data is missing data on acceptable behavior (false positive)
Unacceptable Behavior Models • Define ‘all’ unacceptable behavior • A Priori rules based on ‘experts’ • Catches most Significant Misuse • Misses much unacceptable behavior (hard to define all unacceptable behavior with certainty)
Detecting Hackers (outsider misuse) • Attempts to gain ACCESS • Reading an Object (or file) • Writing an Object (or file) • Planting a TROJAN HORSE • Altering Systems Configuration • Achieving a FULLY Interactive Login
Detecting Hackers (outsider misuse) • Denial of Service (DOS) • Deleting an Object (necessary part of system) • Slowing Down a Network (flooding) • Stopping a Program (necessary part of system) • Filling Storage Space (no work/file space) • Shutting Down a Critical Server
Weapons of Choice • Network Intrusion Detection • Attack Patterns Differ SIGNITIFICANTLY from Normal Access • Attack Patterns Pronounced • Readily Identifiable • Because they EXPLOIT Know Vulnerabilities • Known Vulnerabilities HAVE Known Signatures
Weapons of Choice • Information Assurance Vulnerabilities (IAVA) • Know Vulnerabilities & Harden System Against • Web Sites for Information on Vulnerabilities -- See SANS top 20) • http://www.sans.org/top20/
Misuse Examples • Anomalous Outbound Traffic • Outbound Information Not Requested • Imbalance between Requested and Provided • a sign that someone has gotten into your system and is: • Attaching from it! (Distributed DOS) • Stealing Information!
Misuse Examples • Site being Swept • Range of Attacks AND • Range of IP Addresses • Done to MAP your Site • Done to Probe for Vulnerabilities • Solution: Proper Patches & Configuration
Misuse Examples • Site being Swept (continued) • Information Flood • Unusually Large Traffic Volume • From a “Single” class of Service • From a “Single” IP Address • From Many IP Addresses • Solution - Block IP Address/Class of Service • Problem- Might Block Legitimate Connections
Misuse Examples • Unauthorized Access: Mission-Critical Data • Unauthorized Release (Privacy Violation) • Sensitive Medical/Employee/Customer Information • Unauthorized Alteration • Theft • Appraisals/Safety Reports/Work Reports/Customer Records • Solution: Identify Mission-Critical Data & Define Authorized Use
Behavioral Data Forensics In Intrusion Detection • Data Mining to Identify Trends AND • Specific Activities that Indicate Misuse • Decision Support Capabilities of Intrusion Detection • Find Out What Happened in a Network of Live Computers • Error Detection and Eradication
Behavioral Data Forensics: Benefits • Detect Insiders • Identify Trends: Misuse & Suspicions Activity • Detect Outsiders (Hackers) • Identify Attack Trends to Harden Networks • Improve Policy • Fit Observed Versus Predicted Behavior • Identify Bad or Missing Policy
Data Mining • Means to Extract Unknown, Actionable Data From Among Other Things Data Warehouses • Nontrivial Extraction of Implicit Previously Unknown, & Potentially Useful Information from Data • Process of discovering new correlations, patterns, anomalies and trends by sifting through large amounts of data
Data Mining • Pattern recognition technologies and statistical and mathematical techniques • Tools often based on artificial intelligence techniques • Processing Large Quantities of Data at a Central Location Looking for “Patterns of Interest”
Purpose of Data Mining • Complements predefined and ad hoc access by enabling users to discover new relationships • Improvement over a user's "gut feeling" • Bottom-up discovery data analysis, also known as "knowledge discovery"
Data Mining & Intrusion Detection • MADAM ID • Constructs Intrusion Detection Signatures in systematic and automated manner • Learns classifier that distinguish between intrusions and normal activities • ADAM • Learns normal network behavior from attack-free training data • Connection records of the last delta-seconds continuously mined for new associations rules
Data Mining & Intrusion Detection • Clustering Unlabeled ID Data • normal elements with cluster together and intrusive elements will cluster together • Biggest clusters are normal; smallest are intrusive • Mining the Alarm Stream • Modeling normal and abnormal alarm streams
Forms and Formats (Data Types) • Raw TCP/IP Data (network event capture) • Raw Binary Data (operating system data) • ASCII Application Data (e.g., Syslog) • Detected Signatures (stored in RDBMS) • Behavioral Statistics (stored in RDBMS)
User-Centric versus Target-Centric • Target-centric • Database Optimized to Provide Target Data • Example: All Logins on a set of Target Machines • User-centric • Database Optimized to provide User Data • Example: All Logins by User X on any Target
Examples of Behavioral Data Forensics • Security • Unauthorized Changes to Data (Price Lists) • Track Consultant Activities (Trust) • Administrators Browsing Personal Folders (Abuse of Privilege) • Unauthorized User Logging into Backup Account (if they encrypt your backup your toast)
Examples of Behavioral Data Forensics Security Policy (monitoring for compliance) • Policy Ignored (Locking Screensavers--time to short) • Users Applying the Wrong Profile • Administrators Not Using Backup Accounts for Backups (used admin account instead)
Data Mining Techniques • Data Presentation Refinement (change view and tune parameters) • Tune Parameters until Interesting Features Stand Out • Eliminate Common Occurrences to Zone in on Rarer Interesting occurrences (needle in a haystack)
Data Mining Techniques • Contextual Interpretations (visualization, clustering, pattern match) • Have a Detection Requirement in Mind (predetermined interesting events) • Assign Context to Observed Trends (knowledge discovery)
Data Mining Techniques • Drill Down (get to the root cause--underlying data causing the anomaly) • Focus on: Individual Time Frames (odd hours, surge times), Specific Users (most active, unusual hours, many privileges), Specific Actions (Logons, Updates, Large Transactions, Long Transactions), or Targets (Data Servers, Main Servers, Critical Mission Servers)
Data Mining Techniques • Combining Heterogeneous Data Sources • UNIX, Windows NT/2000, Mainframe • Incorporating Out-of-Band Data Sources • Interviews, Physical Logs, Coworkers
Data Mining Examples • Target Browsing • User Access Multiple Objects in Short Time Frame • Critical File Browsing • Users Directory Hopping • High Activity
Data Mining Examples • Attack Anticipation (Tip-Off) • User Accessing Critical Files at Odd Times (teller when bank is closed) • Target Overload (e.g., Server Overload) • Load Balancing Problem Causes Crash • Damage Assessment -- find loss and document • Surveillance -- employee makes threats • Policy Compliance -- night logout
Summary • Behavioral Data Forensics • Studies Past Behavior in Event Records • Provides Decision Support Capabilities • Detects Hackers and Insider Misuse • Supports Damage Assessment AND • Attack Anticipation • Behavioral Data Forensics Facilitates • Business Process Reengineering