270 likes | 454 Views
An Automated Timeline Reconstruction Approach for Digital Forensic Investigations. Written by Christopher Hargreaves and Jonathan Patterson Presented by Jason McKenzie November 8 th , 2013. Introduction.
E N D
An Automated Timeline Reconstruction Approach for Digital Forensic Investigations Written by Christopher Hargreaves and Jonathan Patterson Presented by Jason McKenzie November 8th, 2013
Introduction • Reconstruction: a process in which an event or series of events is carefully examined in order to find out or show exactly what happened (Merriam Webster) • Provenance: the origin or source of something • Low-level PC event: File modification, registry key update • High-level PC event: Connection of a USB device, like a USB stick • Goal: Construct a software prototype using Python to automatically reconstruct a timeline of events using low-level events to infer high-level events and their provenance
Background • Reconstruction is an essential aspect of digital forensics • Key challenge in digital forensics is the large volume of information that needs to be analyzed • Population owns an increasing number of digital devices • There are tools present that automate the extraction process of a digital investigation, and are useful for examining events that have occurred • There is a demand for explaining the sequence of digital events, and a tool to automatically reconstruct the events and produce a timeline is needed
Related Work • Related work is comprised of solutions that incorporate some form of timeline generation (non automatic) • Timelines based on file system times • Uses metadata from file systems to create a timeline • Modified, Accessed, and Created (MAC) times • The Sleuth Kit generates timeline from file activity • Encase creates graphical “Timeline” view • Times that the contents of files are examined are not captured in metadata and presents a limitation
Related Work (continued) • Timelines including time from inside files • Cyber Forensic Time Lab (CFTL) • Extracts system times from FAT and NTFS hard drives and some file types • Has incomplete source information of extracted events • Log2timeline • Has several enhancements and options that when combined could produce a timeline • Carbone and Bean addressed the need for a rich, event filled timeline in their paper “Generating computer forensic super-timelines under Linux” in 2011 • Key to creating an event filled timeline is to capture more event times
Related Work (continued) • Visualizations • Encase • Visual Timeline • Zeitline • Imports file system times from other programs through the user of Import Filters • Complex events: events directly imported from system • Atomic events: comprised of atomic and other complex events. • Allows for filtering, searching, and combination of atomic into complex events • Aftertime • Performs enhanced timeline generation • Visualizes results as a histogram
Related Work (continued) • Summary • Importance of recovering times from inside files and using file system metadata • Two key challenges: • Too many events to effectively analyze • Difficult to visualize what is going on in the timeline due to the number of events • Highlighting patterns of activity to indicate areas of interest and maintaining records of source of extracted data is important
Methodology • As expressed previously, large volume of events creates a problem for analysis and an inability to visualize the timeline • To counteract this, an approach to automate the process of combining “low-level” events, into “high-level” events is being researched • By automating the conversion of low-level to high-level events a summary of activity would be produced that would help direct the investigation • To facilitate this, a software prototype was constructed
Methodology (continued) • Should frameworks be expanded to accommodate a timeline reconstruction system? • Would take extensive work to build upon an existing framework, like log2timeline • Best to implement a new framework without having to adjust data structures or adjust for legacy languages • Python 3 is chosen for this project due to readability of code
Design • Overall design • Python Digital Forensic Timeline (PyDFT) • Supports low-level event extraction and high-level event reconstruction • Also supports case management, conversion of different formats for date and time, and basic GUI’s
Design (continued) • Generation of low-level events • Overview • Low-level events are file system times and times extracted from within files • Analysis is performed on a mounted file system NOT a disk based image • Recommended approach is to mount disk image in read-only mode using Linux or Mac OS X • Extraction of file system times • Master File Table ($MFT) • Accessed directly on Linux or Mac OS X using NTFS driver from Tuxera • Created, modified, accessed, and entry modified times from Standard Information Attribute are used to build four events for reach file
Design (continued) • Generation of low-level events (continued) • Times from inside files • Extraction Manager calls GetTimesFromInsideFiles() for any files mounted in the file system and checked for time extractors • If found, extracts information from file pointer, file name, file path • Any time information extracted is added to low-level timeline • Time extractors used are browsing history found in Chrome, Firefox, Internet Explorer; Skype, Windows Live Mail, etc.
Design (continued) • Generation of low-level events (continued) • Parsers and bridges • Parsers: process raw data structures and recover data in a useable form • Bridges: takes information from parsers and maps it to a low-level event object • Design approach makes it easier to accommodate new parsers, and code in the parsers easier to reuse
Design (continued) • Generation of low-level events (continued) • Traceability • If extractor returns a low-level event, it also points to the raw data that produced the event. • Different types of provenance based upon event • Low-level event format • Different events have different provenance and have different fields • Id, date_time_min, date_time_max, evidence, provenance, etc.
Design (continued) • Generation of low-level events (continued) • Backing store for the low-level timeline • A back-end storage is required due to the use of Python classes • SQLite chosen as the backing store and allows for multiple advanced queries • Summary • Extraction manager extracts low-level events that are converted to a standard format and added to timeline • Timeline stored in SQLite • Fields like date/time, provenance, and information about the raw data
Design (continued) • Reconstruction of high-level events • Overview • Use of predetermined rules using plug-in scripts to automatically convert low-level events to high-level events • Basic event matching using test events • SQLite requires knowledge of SQL • By creating a test event with all the conditions of the low-level event it’s possible to add events to the high-level timeline without extensive knowledge SQL queries • Comparison match (not exact match) with test events and low-level events • Matching field values can produce SQL searches for those fields and then create high-level events
Design (continued) • Reconstruction of high-level events (continued) • Matching multiple artefacts • “Test events” serve as triggers and any matches are used to construct a hypothesis of a high-level event • Low-level timeline created in memory for a specific period determined by the analyzer • Analyzer searches for all low-level events occurring in this period • If matches are found are considered supporting artefacts • If matches are not found are considered contradictory artefacts • One ore more high-level events created based upon these artefacts
Design (continued) • Reconstruction of high-level events (continued) • High level event format • Similar to low-level event format • Includes files, trigger_evidence_artefact, supporting_evidence_artefact, contradictory_evidence_artefact • High-level timeline output • Not stored in SQLite • Exports to XML and individual high-level event HTML reporting
Design (continued) • Reconstruction of high-level events (continued) • Summary • Searching timeline through the use of “test events” that have similarities to desired low-level events • One or more match leads to one or more high-level event • Since low-level event information is preserved, it can still point to the raw data that generated the low-level event • Produces two timelines • Low-level event timeline (not very readable) • High-level event timeline (human readable)
Results • Examples of high-level events constructed • Google searches • 11:28:30 Google search for ‘how to hack wifi’ • USB device connection • “Setup API entry for USB found (VIBL07AB PID:FCF6 Serial:07A80207B128BE08)”
Results (continued) • Visualization • Since there are usually not a large amount of high-level events it’s possible to use a third-party program like Timeflow to display them graphically • In the high-level timeline below there are 2894 low-level events that have occurred (obviously not displayed)
Results (continued) • Performance • Calculations based on Intel Core 2 Duo 2.28-28 GHz and 4-8GB of ram • 1 Million events, ~2min per analyzer, 22 analyzers = 44 minutes to process 1 million events • Equivalent to other indexing or searching forensics tools (“start search and walk away”) • No plans to optimize performance
Evaluation • Results section reinforces that the use of “test events” matching low-level events, which is considered “temporal proximity pattern matching”, is effective at creating high-level events automatically • Need to develop more analyzers and time extractors to further reinforce feasibility of “temporal proximity pattern matching” • Need to implement low-level extractors that are currently not available for some aspects of the disk like Recycle Bin • Need to determine if keeping high-level provenance of information is required since the associated low-level provenance is preserved
Evaluation (continued) • Although performance is within limits compared to other forensics tools a bottleneck exists due to each analyzer searching through the timeline linearly for patterns • More analyzers means a greater bottleneck • Needs optimization for multi-core processors • Optimization of SQLite secondary indexing could improve performance • Need to implement a way of verifying target PC’s clock is correct • Need more robust testing of the prototype
Future work • Creation of more low-level event extractors • Creation of more analyzers • Formalizing low-level event information • Inputting data from other tools • Testing of framework against real world data • Adding complexity to analysis scripts, such as Bayesian networks • Development of more robust visual data tools for timelining
Conclusions • Illustrates possibility of pattern matching to automatically reconstruct high-level human-understandable events which then creates a readable visualization of the timeline • Preserves provenance of low-level events • Not to be used to replace a full forensic analysis by an experienced, trained analyst