100 likes | 237 Views
“XIRAF – XML-based indexing and querying for digital forensics”. A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries. The Problem. Large amount of data – possibly terabytes Limited amount of time Higher chance of missing traces Diversity of data
E N D
“XIRAF – XML-based indexing and querying for digitalforensics” A summary of the report written by W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries.
The Problem • Large amount of data – possibly terabytes • Limited amount of time • Higher chance of missing traces • Diversity of data • Too many specialized tools • Difficult to integrate results • Time constraints • Knowledge constraints
Solution • Separate feature extraction from analysis: • Feature Extraction: The extraction of useful features from raw data- Includes more than just file data • Analysis: Browsing, querying and correlating. • One output format for forensic analysis tools (based on XML) • XML for storing and querying the output of the tools. • Automate feature extraction • Various current projects in law enforcement community related to automated feature extraction
XIRAF • Prototype system that uses this approach • “XML Information Retrieval Approach to digital Forensics” • Automatic feature extraction from disk image/s • Stores data in XML database • Uses XQuery (XML query language) to access the database and the data from the disk-image.
Framework • 3 components: • Tool repository: feature extraction tools • Feature extraction manager: manages the invocation of the tools, merges output and stores it in storage subsystem. • Storage subsystem: composed of raw evidence (binary large objects) and extracted features (XML)
General Overview of process • Image fed to system • (binary data) • Feature Extraction Manager extracts useful features • (uses tool repository) • Feature Extraction Manager stores features in single XML document (in form of a tree). • The Feature Extraction Manager can then run other tools on the found data and add to the xml document. • Data stored in storage sub system, where the binary data or the XML tree can be accessed
Forensic Applications • Timeline browser • Mainstream tools do file-system browsing (relies on file-system meta-data) • This application of XIRAF can get all XML fragments with a timestamp, gathered from different tools (which could include things like chat logs). • Photo search • Finds digital images that meet desired conditions • Can consider camera model, date and time of recording, image resolution and more.
Forensic Applications • Child Pornography Detection • Uses hash of various files that are known to contain child pornography • Matches files against a database of hashes • The hash database is converted to XML, and preloaded into the XML database XIRAF contains. • The comparison is done during the feature extraction phase.
Conclusion/future work • Too early to draw definitive conclusions (just a prototype) • An increasing number of tools have started producing output in XML. • Mobile phone queries • More knowledge bases
References • W. Alink, R.A.F. Bhoedjang, P.A. Boncz, and A.P. de Vries. ““XIRAF – XML-based indexing and querying for digital forensics”. Available at: http://dfrws.org/2006/proceedings/7-Alink.pdf