1 / 27

Automated Reassembly of Document Fragments

This paper explores an automated solution for reassembling objects from fragmented digital evidence, addressing the challenges of scattered fragments in digital forensics. The process involves preprocessing, cryptanalysis, weight assignments, collating, and hierarchical reassembly. Using context-based statistical models and adjacency matrices, the method seeks to find a near-optimal solution to reconstruct the original document. Implementation includes Prediction by Partial Matching (PPM) and building a solution tree to prune and optimize the reassembly process. Experiments and results are discussed, highlighting the potential for future work in this area.

jmaclennan
Download Presentation

Automated Reassembly of Document Fragments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Reassembly of Document Fragments DFRWS 2002

  2. Outline • Introduction & Motivation • Stages in Reassembly • Reassembly Problem • Our Solution • Implementation & Experiments • Summary

  3. Introduction & Motivation

  4. Introduction • Reassembly of objects from mixed fragments • Common problem in: • Classical Forensics • Failure Analysis • Archaeology • Well studied, automated… • Is there a similar problem in digital forensics?

  5. F1 Split F2 Evidence F3 Fn-1 Keywords Fn Fragments Motivation • Digital evidence is • malleable • easily scattered • Fragmentation process

  6. Motivation… • Scenarios… • Hiding in Slack Space • Criminal splits the document and hides them selectively into slack spaces based on a password • Swap File • Addressing & state information is not available on the disk • Peer-to-peer systems • Fragments are assigned a sequence of keywords and scattered across the network • e.g.: FreeNet, M-o-o-t

  7. Stages of Reassembly

  8. F1 F2 Fx preprocessing collating reassembly G1 G1 Gy H1 H1 Hz Stages of Reassembly F1 F G1 G Fx H1 H Gy Hz

  9. Stages of Reassembly • Preprocessing • Cryptanalysis • Weight Assignments • Collating • Group together fragments of a document • Hierarchical approach • Reassembly • Reordering the fragments to form the original document

  10. Reassembly

  11. The Problem of Reassembly • Suppose we have fragments {A0, A1, … An} of document A • Compute a permutation X such that A= AX(0)||AX(1)|| … AX(n) • To compute A, we need to find adjacent fragments • To reassemble: • Need to find adjacent fragments • Automate the process

  12. wn fox jumps the quick bro over the lazy dog Quantifying Adjacency • An Example: A linguist may assign probabilities based on syntactic and semantic analysis • This process is language dependent

  13. abracadabra abr a context prediction Context-Based Statistical Models • Context based models are used in data compression • Predicts subsequent symbols based on current context • Works well on natural languages as well as other data types • Context models can be used to predict upcoming symbols and assign candidate probabilities

  14. Adjacency Matrix • Candidate probabilities of each pair of fragments form complete graph • A Hamiltonian path that maximizes the sum of candidate probabilities is our solution • But this problem is intractable • We will discuss a near optimal solution a b c d e a 0 c(a,b) …. b c(b,a)0 …. c d e c(e,a) a e b d c

  15. Steps in Reassembling • Build context model using all the fragments • Compute candidate probabilities for each pair • Find a Hamiltonian Path that maximizes the sum of candidate probabilities

  16. Implementation & Experiments

  17. Prediction by Partial Matching (PPM) abracadabra • Uses a suite of fixed order context models • Uses one or more orders to predict upcoming symbol • We process each fragment with PPM • Combine the statistics to form a model for all the fragments

  18. Candidate Probability • Slide a window of size d from one fragment into the other • At each position, use the window as context and determine the probability (pi) of next symbol • Candidate prob. C(1,2) = (p0* p1*…pd) fragment 1 fragment 2 a b r a c a d a b r a c d a b a

  19. a a d b c e c d e e b c e d e c c Solution Tree • Assumtions: • Fragments are recovered without data loss • First fragment is known/easily identified • Paths in complete path can be represented as a tree • Tree grows exponentially! • We have to prune the tree

  20. c d e c e b c e c c b Pruning a • At every level choose a node with the largest candidate probability • We can choose alpha nodes at each level • By looking at candidate probabilities beta levels deep d b c e b c e

  21. Experiments

  22. Data Set

  23. Reassembly for various types

  24. Compression ratio

  25. F1 Reassembly Concatenate Final Sequence F2 Fn Concatenated fragments Iterative Approach

  26. Summary • Introduced reassembly of scattered evidence • Experiments & results • Future work: • Identifying preprocessing heuristics • Compare performance with other models • Work on reassembling images

  27. Questions

More Related