220 likes | 504 Views
Digital Forensics. Dr. Bhavani Thuraisingham The University of Texas at Dallas Application Forensics November 5, 2008. Outline. Email Forensics UTD work on Email worm detection - revisited Mobile System Forensics Note: Other Application/systems related forensics
E N D
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Application Forensics November 5, 2008
Outline • Email Forensics • UTD work on Email worm detection - revisited • Mobile System Forensics • Note: Other Application/systems related forensics • Database forensics, Network forensics (already discussed) • Papers to discuss November 10, 2008 and November 17, 2008 • Reference: Chapters 12 and 13 of text book • Optional paper to read: • http://www.mindswap.org/papers/Trust.pdf
Email Forensics • Email Investigations • Client/Server roles • Email crimes and violations • Email servers • Email forensics tools
Email Investigations • Types of email investigations • Emails have worms and viruses – suspicious emails • Checking emails in a crime – homicide • Types of suspicious emails • Phishing emails i- they are in HTML format and redirect to suspicious web sites • Nigerian scam • Spoofing emails
Client/Server Roles • Client-Server architecture • Email servers runs the email server programs – example Microsoft Exchange Server • Email runs the client program – example Outlook • Identitication/authntictaion is used for client to access the server • Intranet/Internet email servers • Intranet – local environment • Internet – public: example: yahoo, hotmail etc.
Email Crimes and Violations • Goal is to determine who is behind the crime such as who sent the email • Steps to email forensics • Examine email message • Copy email message – also forward email • View and examine email header: tools available for outlook and other email clients • Examine additional files such as address books • Trace the message using various Internet tools • Examine network logs (netflow analysis) • Note: UTD Netflow tools SCRUB are in SourceForge
Email Servers • Need to work with the network administrator on how to retrieve messages from the server • Understand how the server records and handles the messages • How are the email logs created and stored • How are deleted email messages handled by the server? Are copies of the messages still kept? • Chapter 12 discussed email servers by UNIX, Microsoft, Novell
Email Forensics Tools • Several tools for Outlook Express, Eudora Exchange, Lotus notes • Tools for log analysis, recovering deleted emails, • Examples: • AccessData FTK • FINALeMAIL • EDBXtract • MailRecovery
Worm Detection: Introduction • What are worms? • Self-replicating program; Exploits software vulnerability on a victim; Remotely infects other victims • Evil worms • Severe effect; Code Red epidemic cost $2.6 Billion • Goals of worm detection • Real-time detection • Issues • Substantial Volume of Identical Traffic, Random Probing • Methods for worm detection • Count number of sources/destinations; Count number of failed connection attempts • Worm Types • Email worms, Instant Messaging worms, Internet worms, IRC worms, File-sharing Networks worms • Automatic signature generation possible • EarlyBird System (S. Singh -UCSD); Autograph (H. Ah-Kim - CMU)
Email Worm Detection using Data Mining • Task: • given some training instances of both “normal” and “viral” emails, • induce a hypothesis to detect “viral” emails. • We used: • Naïve Bayes • SVM Outgoing Emails The Model Test data Feature extraction Classifier Machine Learning Training data Cleanor Infected ?
Assumptions • Features are based on outgoing emails. • Different users have different “normal” behaviour. • Analysis should be per-user basis. • Two groups of features • Per email (#of attachments, HTML in body, text/binary attachments) • Per window (mean words in body, variable words in subject) • Total of 24 features identified • Goal: Identify “normal” and “viral” emails based on these features
Feature sets • Per email features • Binary valued Features • Presence of HTML; script tags/attributes; embedded images; hyperlinks; • Presence of binary, text attachments; MIME types of file attachments • Continuous-valued Features • Number of attachments; Number of words/characters in the subject and body • Per window features • Number of emails sent; Number of unique email recipients; Number of unique sender addresses; Average number of words/characters per subject, body; average word length:; Variance in number of words/characters per subject, body; Variance in word length • Ratio of emails with attachments
Data Mining Approach Classifier Clean/ Infected Test instance Clean/ Infected infected? SVM Naïve Bayes Test instance Clean? Clean
Data set • Collected from UC Berkeley. • Contains instances for both normal and viral emails. • Six worm types: • bagle.f, bubbleboy, mydoom.m, • mydoom.u, netsky.d, sobig.f • Originally Six sets of data: • training instances: normal (400) + five worms (5x200) • testing instances: normal (1200) + the sixth worm (200) • Problem: Not balanced, no cross validation reported • Solution: re-arrange the data and apply cross-validation
Our Implementation and Analysis • Implementation • Naïve Bayes: Assume “Normal” distribution of numeric and real data; smoothing applied • SVM: with the parameter settings: one-class SVM with the radial basis function using “gamma” = 0.015 and “nu” = 0.1. • Analysis • NB alone performs better than other techniques • SVM alone also performs better if parameters are set correctly • mydoom.m and VBS.Bubbleboy data set are not sufficient (very low detection accuracy in all classifiers) • The feature-based approach seems to be useful only when we have • identified the relevant features • gathered enough training data • Implement classifiers with best parameter settings
Mobile Device/System Forensics • Mobile device forensics overview • Acquisition procedures • Summary
Mobile Device Forensics Overview • What is stored in cell phones • Incoming/outgoing/missed calls • Text messages • Short messages • Instant messaging logs • Web pages • Pictures • Calendars • Address books • Music files • Voice records
Mobile Phones • Multiple generations • Analog, Digital personal communications, Third generations (increased bandwidth and other features) • Digital networks • CDMA, GSM, TDMA, - - - • Proprietary OSs • SIM Cards (Subscriber Identity Module) • Identifies the subscriber to the network • Stores personal information, addresses books, etc. • PDAs (Personal digital assistant) • Combines mobile phone and laptop technologies
Acquisition procedures • Mobile devices have volatile memory, so need to retrieve RAM before losing power • Isolate device from incoming signals • Store the device in a special bag • Need to carry out forensics in a special lab (e.g., SAIAL) • Examine the following • Internal memory, SIM card, other external memory cards, System server, also may need information from service provider to determine location of the person who made the call
Mobile Forensics Tools • Reads SIM Card files • Analyze file content (text messages etc.) • Recovers deleted messages • Manages PIN codes • Generates reports • Archives files with MD5, SHA-1 hash values • Exports data to files • Supports international character sets
Papers to discuss: November 10, 2008 • FORZA – Digital forensics investigation framework that incorporate legal issues • http://dfrws.org/2006/proceedings/4-Ieong.pdf • A cyber forensics ontology: Creating a new approach to studying cyber forensics • http://dfrws.org/2006/proceedings/5-Brinson.pdf • Arriving at an anti-forensics consensus: Examining how to define and control the anti-forensics problem • http://dfrws.org/2006/proceedings/6-Harris.pdf
Papers to discuss November 17, 2008 • Forensic feature extraction and cross-drive analysis • http://dfrws.org/2006/proceedings/10-Garfinkel.pdf • md5bloom: Forensic file system hashing revisited (OPTIONAL) • http://dfrws.org/2006/proceedings/11-Roussev.pdf • Identifying almost identical files using context triggered piecewise hashing (OPTIONAL) • http://dfrws.org/2006/proceedings/12-Kornblum.pdf • A correlation method for establishing provenance of timestamps in digital evidence • http://dfrws.org/2006/proceedings/13-%20Schatz.pdf