Digital Forensics

Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture 23 Intelligent Digital Forensics October 22, 2007

Reading for Monday October 22 • http://dfrws.org/2006/proceedings/7-Alink.pdf • XIRAF – XML-based indexing and querying for digital forensics http://dfrws.org/2006/proceedings/8-Turner.pdf • Selective and intelligent imaging using digital evidence bags • http://dfrws.org/2006/proceedings/9-Lee.pdf • Detecting false captioning using common-sense reasoning

Outline • Review of Lectures 19-22 • Discussion of the papers on Intelligence Digital Forensics • Appendix: feature extraction

Review of Lectures 19-21 • Auditing and Forensic Analysis • Richard T. Snodgrass, Stanley Yao and Christian Collberg, "Tamper Detection in Audit Logs," In Proceedings of the International Conference on Very Large Databases, Toronto, Canada, August–September 2004, pp. 504–515. • Kyri Pavlou and Richard T. Snodgrass, "Forensic Analysis of Database Tampering," in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pages 109-120, Chicago, June, 2006. • Additional paper for reading: Kyri Pavlou and Richard. T. Snodgrass, "The Pre-images of Bitwise AND Functions in Forensic Analysis,'' TimeCenter TR 87, October, 2006. • http://www.cs.arizona.edu/~rts/publications.html

Abstract of Paper 1 • This paper describes a novel, XML-based approach towards managing and querying forensic traces extracted from digital evidence. This approach has been implemented in XIRAF, a prototype system for forensic analysis. XIRAF systematically applies forensic analysis tools to evidence files (e.g., hard disk images). Each tool produces structured XML annotations that can refer to regions (byte ranges) in an evidence file. XIRAF stores such annotations in an XML database, which allows us to query the annotations using a single, powerful query language (XQuery). XIRAF provides the forensic investigator with a rich query environment in which browsing, searching, and predefined query templates are all expressed in terms of XML database queries

Introduction • Framework for forensic analysis called XIRAF • A clean separation between feature extraction and analysis • Features extracted are stored in XML format • A single, XML-based output format for forensic analysis tools • The use of XML database technology for storing and querying the XML output of analysis tools.

XIRAF Framework • Consists of three components • Feature extraction manager • Features are extracted from BLOBs (Binary large objects) using feature extraction tools • Output of the tools are coded in XML for the forensics analyzer • Tool repository • Tools are wrapped (e.g., object wrappers) • Storage subsysystem • Stores BLOBs and XML annotations • XQuery used to query XML data

Forensic Applications • Authors have implemented following applications • Timeline browser: Through web browser examiner can look at data/time of interest • Photo search • Search for images satisfying certain conditions • Child pornography detection • Using hashing carried out matching

Summary and Directions • The separation of feature extraction and analysis brings benefits to both phases. XIRAF extracts features automatically, which is essential when processing large input sets. • The use of XML as a common, intermediate output format for tools allows the integration of the output of diverse, independent tools that produce similar information. This handles both the heterogeneity present in the input data (e.g., different browser types) and with the diversity of forensic analysis tools. • These benefits are demonstrated both by the timeline browser and by child pornography detection program. • By storing extracted features in an XML database system one can analyze those features using a single, general-purpose, powerful query language. In addition, we benefit automatically from advances that are made in the area of XML database systems • Directions: Use semantic web technologies?

Abstract of Paper 2 • This paper defines what selective imaging is, and the types of selective imaging that can be performed. This is contrasted with intelligent imaging and the additional capabilities that have to be built into an imager for it to be ‘intelligent’. A selective information capture scenario is demonstrated using the digital evidence bag (DEB) storage format. A DEB is a universal container for digital evidence from any source that allows the provenance to be recorded and continuity to be maintained throughout the life of the investigation. The paper concludes by defining the ‘ultimate test’ for an intelligent and selective imager

Selective Imaging • Selective imaging is a term that is generally associated with the decision not to acquire all the possible information during the capture process. • It is now recognized that ‘partial or selective file copying may be considered as an alternative’ when it may not be practical to acquire everything. • Techniques include manual selection, semi-automatic selection, automatic selection

Intelligent Imaging • Include the domain experts in the imaging process • How do you go about capturing the knowledge of the technical experts that are familiar with digital technical complexities and legal domain experts and combine them? • How do you know that you have captured everything relevant to the case under investigation or have not missed evidence of other offences?

Digital Evidence Bags • Both selective and intelligent imaging techniques offer many more options and capabilities than current bit stream imaging. • There are currently no commercial tools that perform selective imaging and adequately record the provenance of the selected information. • Furthermore, no method has existed that captured the criteria or method used by the examiner in deciding what to acquire. For example, was an arbitrary manual selection used or was information captured based on category of information, file extensions, file signature or hash set. • Authors solution to these problems is by the use of the digital evidence bag (DEB) format. A DEB is a universal container for digital information from any source. It allows the provenance of digital information to be recorded and continuity to be maintained throughout the life of the exhibit. • Additionally, DEBs may be encapsulated within other DEBs. This feature differentiates the DEB structure from that used by current monolithic formats commonly in use.

“The Ultimate Test” • The method and storage container used must be able to store sufficient information about the provenance of the information captured such that when the information is restored it is identical to that which would have been acquired should a bit stream image have been taken.

Summary and Directions • The methodology described and demonstrated by the authors is claimed to be a big improvement over bit stream imaging methods currently used. • Directions • Better selection methods, more accurate?

Abstract of Paper 3 • Detecting manipulated images has become an important problem in many domains (including medical imaging, forensics, journalism and scientific publication) largely due to the recent success of image synthesis techniques and the accessibility of image editing software. Many previous signal-processing techniques are concerned about finding forgery through simple transformation (e.g. resizing, rotating, or scaling), yet little attention is given to examining the semantic content of an image, which is the main issue in recent image forgeries. Here, the authors present a complete workflow for finding the anomalies within images by combining the methods known in computer graphics and artificial intelligence. They first find perceptually meaningful regions using an image segmentation technique and classify these regions based on image statistics. We then use AI common-sense reasoning techniques to find ambiguities and anomalies within an image as well as perform reasoning across a corpus of images to identify a semantically based candidate list of potential fraudulent images. Their method introduces a novel framework for forensic reasoning, which allows detection of image tampering, even with nearly flawless mathematical techniques.

Introduction • Detecting manipulated images has become an important problem in many domains • Many previous signal-processing techniques are concerned about finding forgery through simple transformation (e.g. resizing, rotating, or scaling), • Need to examine the semantic content of an image • Authors present a complete workflow for finding the anomalies within images by combining the methods known in computer graphics and artificial intelligence

Introduction • In Photo fakery, photo manipulation techniques may fall into four categories: • Deletion of details: removing scene elements • Insertion of details: adding scene elements • Photomontage: combining multiple images • False captioning: misrepresenting image content

Technical Approach • Authors find perceptually meaningful regions using an image segmentation technique and classify these regions based on image statistics. • They then use AI common-sense reasoning techniques to find ambiguities and anomalies within an image as well as perform reasoning across a corpus of images to identify a semantically based candidate list of potential fraudulent images. • They claim their method introduces a novel framework for forensic reasoning, which allows detection of image tampering, even with nearly flawless mathematical techniques.

Technical Approach • Image Segmentation • Segment the source into regions of importance • Compare across images in a corpus • Classification • Segment based classification • Common sense reasoning • Handles classification ambiguities

Summary and Directions • Introduces a hybrid method for image forensics. • Given a subset of a corpus as a suspicious candidate set, analyze the candidates through specific metrics that are optimized to find fakery given the image’s qualitative classification. This use of common-sense reasoning goes • Directions • To integrate the facts discovered in a photo corpus to help identify what evidence may be missing as well as what fact might be unique to this scenario. • ??

Appendix • Image Annotation and Classification: Lecture #24

Digital Forensics