1 / 26

An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

An Automated Timeline Reconstruction Approach for Digital Forensic Investigations. Written by Christopher Hargreaves and Jonathan Patterson Presented by Jason McKenzie November 8 th , 2013. Introduction.

arva
Download Presentation

An Automated Timeline Reconstruction Approach for Digital Forensic Investigations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Automated Timeline Reconstruction Approach for Digital Forensic Investigations Written by Christopher Hargreaves and Jonathan Patterson Presented by Jason McKenzie November 8th, 2013

  2. Introduction • Reconstruction: a process in which an event or series of events is carefully examined in order to find out or show exactly what happened (Merriam Webster) • Provenance: the origin or source of something • Low-level PC event: File modification, registry key update • High-level PC event: Connection of a USB device, like a USB stick • Goal: Construct a software prototype using Python to automatically reconstruct a timeline of events using low-level events to infer high-level events and their provenance

  3. Background • Reconstruction is an essential aspect of digital forensics • Key challenge in digital forensics is the large volume of information that needs to be analyzed • Population owns an increasing number of digital devices • There are tools present that automate the extraction process of a digital investigation, and are useful for examining events that have occurred • There is a demand for explaining the sequence of digital events, and a tool to automatically reconstruct the events and produce a timeline is needed

  4. Related Work • Related work is comprised of solutions that incorporate some form of timeline generation (non automatic) • Timelines based on file system times • Uses metadata from file systems to create a timeline • Modified, Accessed, and Created (MAC) times • The Sleuth Kit generates timeline from file activity • Encase creates graphical “Timeline” view • Times that the contents of files are examined are not captured in metadata and presents a limitation

  5. Related Work (continued) • Timelines including time from inside files • Cyber Forensic Time Lab (CFTL) • Extracts system times from FAT and NTFS hard drives and some file types • Has incomplete source information of extracted events • Log2timeline • Has several enhancements and options that when combined could produce a timeline • Carbone and Bean addressed the need for a rich, event filled timeline in their paper “Generating computer forensic super-timelines under Linux” in 2011 • Key to creating an event filled timeline is to capture more event times

  6. Related Work (continued) • Visualizations • Encase • Visual Timeline • Zeitline • Imports file system times from other programs through the user of Import Filters • Complex events: events directly imported from system • Atomic events: comprised of atomic and other complex events. • Allows for filtering, searching, and combination of atomic into complex events • Aftertime • Performs enhanced timeline generation • Visualizes results as a histogram

  7. Related Work (continued) • Summary • Importance of recovering times from inside files and using file system metadata • Two key challenges: • Too many events to effectively analyze • Difficult to visualize what is going on in the timeline due to the number of events • Highlighting patterns of activity to indicate areas of interest and maintaining records of source of extracted data is important

  8. Methodology • As expressed previously, large volume of events creates a problem for analysis and an inability to visualize the timeline • To counteract this, an approach to automate the process of combining “low-level” events, into “high-level” events is being researched • By automating the conversion of low-level to high-level events a summary of activity would be produced that would help direct the investigation • To facilitate this, a software prototype was constructed

  9. Methodology (continued) • Should frameworks be expanded to accommodate a timeline reconstruction system? • Would take extensive work to build upon an existing framework, like log2timeline • Best to implement a new framework without having to adjust data structures or adjust for legacy languages • Python 3 is chosen for this project due to readability of code

  10. Design • Overall design • Python Digital Forensic Timeline (PyDFT) • Supports low-level event extraction and high-level event reconstruction • Also supports case management, conversion of different formats for date and time, and basic GUI’s

  11. Design (continued) • Generation of low-level events • Overview • Low-level events are file system times and times extracted from within files • Analysis is performed on a mounted file system NOT a disk based image • Recommended approach is to mount disk image in read-only mode using Linux or Mac OS X • Extraction of file system times • Master File Table ($MFT) • Accessed directly on Linux or Mac OS X using NTFS driver from Tuxera • Created, modified, accessed, and entry modified times from Standard Information Attribute are used to build four events for reach file

  12. Design (continued) • Generation of low-level events (continued) • Times from inside files • Extraction Manager calls GetTimesFromInsideFiles() for any files mounted in the file system and checked for time extractors • If found, extracts information from file pointer, file name, file path • Any time information extracted is added to low-level timeline • Time extractors used are browsing history found in Chrome, Firefox, Internet Explorer; Skype, Windows Live Mail, etc.

  13. Design (continued) • Generation of low-level events (continued) • Parsers and bridges • Parsers: process raw data structures and recover data in a useable form • Bridges: takes information from parsers and maps it to a low-level event object • Design approach makes it easier to accommodate new parsers, and code in the parsers easier to reuse

  14. Design (continued) • Generation of low-level events (continued) • Traceability • If extractor returns a low-level event, it also points to the raw data that produced the event. • Different types of provenance based upon event • Low-level event format • Different events have different provenance and have different fields • Id, date_time_min, date_time_max, evidence, provenance, etc.

  15. Design (continued) • Generation of low-level events (continued) • Backing store for the low-level timeline • A back-end storage is required due to the use of Python classes • SQLite chosen as the backing store and allows for multiple advanced queries • Summary • Extraction manager extracts low-level events that are converted to a standard format and added to timeline • Timeline stored in SQLite • Fields like date/time, provenance, and information about the raw data

  16. Design (continued) • Reconstruction of high-level events • Overview • Use of predetermined rules using plug-in scripts to automatically convert low-level events to high-level events • Basic event matching using test events • SQLite requires knowledge of SQL • By creating a test event with all the conditions of the low-level event it’s possible to add events to the high-level timeline without extensive knowledge SQL queries • Comparison match (not exact match) with test events and low-level events • Matching field values can produce SQL searches for those fields and then create high-level events

  17. Design (continued) • Reconstruction of high-level events (continued) • Matching multiple artefacts • “Test events” serve as triggers and any matches are used to construct a hypothesis of a high-level event • Low-level timeline created in memory for a specific period determined by the analyzer • Analyzer searches for all low-level events occurring in this period • If matches are found are considered supporting artefacts • If matches are not found are considered contradictory artefacts • One ore more high-level events created based upon these artefacts

  18. Design (continued) • Reconstruction of high-level events (continued) • High level event format • Similar to low-level event format • Includes files, trigger_evidence_artefact, supporting_evidence_artefact, contradictory_evidence_artefact • High-level timeline output • Not stored in SQLite • Exports to XML and individual high-level event HTML reporting

  19. Design (continued) • Reconstruction of high-level events (continued) • Summary • Searching timeline through the use of “test events” that have similarities to desired low-level events • One or more match leads to one or more high-level event • Since low-level event information is preserved, it can still point to the raw data that generated the low-level event • Produces two timelines • Low-level event timeline (not very readable) • High-level event timeline (human readable)

  20. Results • Examples of high-level events constructed • Google searches • 11:28:30 Google search for ‘how to hack wifi’ • USB device connection • “Setup API entry for USB found (VIBL07AB PID:FCF6 Serial:07A80207B128BE08)”

  21. Results (continued) • Visualization • Since there are usually not a large amount of high-level events it’s possible to use a third-party program like Timeflow to display them graphically • In the high-level timeline below there are 2894 low-level events that have occurred (obviously not displayed)

  22. Results (continued) • Performance • Calculations based on Intel Core 2 Duo 2.28-28 GHz and 4-8GB of ram • 1 Million events, ~2min per analyzer, 22 analyzers = 44 minutes to process 1 million events • Equivalent to other indexing or searching forensics tools (“start search and walk away”) • No plans to optimize performance

  23. Evaluation • Results section reinforces that the use of “test events” matching low-level events, which is considered “temporal proximity pattern matching”, is effective at creating high-level events automatically • Need to develop more analyzers and time extractors to further reinforce feasibility of “temporal proximity pattern matching” • Need to implement low-level extractors that are currently not available for some aspects of the disk like Recycle Bin • Need to determine if keeping high-level provenance of information is required since the associated low-level provenance is preserved

  24. Evaluation (continued) • Although performance is within limits compared to other forensics tools a bottleneck exists due to each analyzer searching through the timeline linearly for patterns • More analyzers means a greater bottleneck • Needs optimization for multi-core processors • Optimization of SQLite secondary indexing could improve performance • Need to implement a way of verifying target PC’s clock is correct • Need more robust testing of the prototype

  25. Future work • Creation of more low-level event extractors • Creation of more analyzers • Formalizing low-level event information • Inputting data from other tools • Testing of framework against real world data • Adding complexity to analysis scripts, such as Bayesian networks • Development of more robust visual data tools for timelining

  26. Conclusions • Illustrates possibility of pattern matching to automatically reconstruct high-level human-understandable events which then creates a readable visualization of the timeline • Preserves provenance of low-level events • Not to be used to replace a full forensic analysis by an experienced, trained analyst

More Related