490 likes | 917 Views
Presenters: Muhammad Mohsin Butt(g201103010). COE589 Paper Presentation. The Evolution of File Carving. Contents. Introduction Background Traditional Recovery File Carving Smart Carver Conclusion. Introduction. This Survey presents various File Carving techniques.
E N D
Presenters: Muhammad Mohsin Butt(g201103010) COE589 Paper Presentation The Evolution of File Carving
Contents • Introduction • Background • Traditional Recovery • File Carving • Smart Carver • Conclusion
Introduction • This Survey presents various File Carving techniques. • File carving is a forensic technique to recover data based on file structure and content. • No file system meta-data is required • Main Focus of this paper is on File carving techniques for Fragmented Data.
Background • File System • Part of OS that manages the creation, deletion, allocation various other functions on files. • FAT 32 and NTFS File Systems are most famous for Windows OS. • Basic unit of data storage on disks is cluster. • Clusters are usually multiples of 512 Bytes.
Background • Recovery In FAT -32(File Allocation Table) • Files can be allocated in different ways. • Contiguous Allocation. • Linked Allocation. • Indexed Allocation.
Background • Contiguous Allocation. Linked Allocation
Background • Indexed Allocation
Background • Indexed Allocation
Traditional Recovery Techniques • These recovery techniques use the met-data of file system to recover data. • Data Storage in FAT32
Traditional Recovery Techniques • Deletion and Recovery in FAT32.
File Carving • What if we don’t have file system meta-data information ?? • File carving recovers data without using file system information. • Knowledge of Structure of files to be recovered is used. • File Carving can be divided into two categories • File Carving for non Fragmented data. • File Carving for Fragmented data.
File Carving (First Generation) • Performed good for non fragmented data. • In forensics user data (Images, documents etc) is important to recover. • The search pool is reduced by removing operating system files which are detected using their MD5 Hash and keywords. • Byte Sequences at prescribed offsets are used to identify files.
File Carving (First Generation) • Header and footer information of files to be recovered is used. • JPEG image header cluster begin with sequence FFD8. • JPEG image footer cluster contains the sequence FFD9. • Some files don’t have footer information. • BMP image has file size, number of clusters and other info present in header. • Number of unallocated clusters as indicated by the header of BMP image are merged for recovery.
File Carving (First Generation) • Foremost tool implemented both header to footer carving and also carving based on header and size of file information. • Scalpel built on foremost engine improved the performance and memory usage of this file carving techniques. • Both these suffer degradation in performance when data is fragmented.
Fragmentation • As files are edited, modified and deleted, most hard drives get fragmented. • Also depends on allocation methodology of file system. • Fragmentation in forensically important files like email, WORD document etc. is high. Why?? • Because of constant editing, deletion and addition PST files are most fragmented. • Wear Leveling Algorithms in Next Gen Hard Drives (SSD) also cause fragmentation.
Fragmentation Fragmented File Recovery
Graph Theoretic Carvers. • Provide Recovery of fragmented files. • Recovery is formulated as a Hamiltonian Path Problem. • Solved using alpha-beta heuristics.
Hamiltonian Path Problem. • Given a set of clusters. • Find a permutation of these clusters that recovers the correct file. • Identify pairs that are adjacent in original document. • Assign weights between clusters which represent the likelihood one cluster following the other in original file. • The best permutation is the on that maximizes the candidate weights of adjacent clusters.
Hamiltonian Path Problem. • Formulated as a graph. • Vertices represent clusters. • Edges represent weights between clusters. • Problem Reduces to finding a maximum weight Hamiltonian path in this graph.
Assigning Weights • Weight assignment is the key in this type of carving. • Prediction By Partial Matching (PPM) technique is used for assigning weights. • PPM is good for Texts.
Assigning Weights • Weight Assignment in Images
K-Vertex Disjoint Path Problem. • Hamiltonian Path method assumed that all the clusters belong to same file. • In actual systems multiple files are fragmented together. • Headers of various files are identified from the pool of clusters. • Graph is again formed using weights. • Now K-disjoint paths are found in this graph using various algorithms where k represents number of headers found in previous step. • Developed primarily for recovering images.
K-Vertex Disjoint Path Problem. • Various algorithms to find k disjoint paths. • Unique Path (UP) Algorithms provides best performance. • Each Cluster is assigned to only one file. • Incorrect assignment may result in two files incorrectly recovered. • Parallel Unique Path Algorithm. • Shortest Path First Algorithm.
Parallel Unique Path (PUP). • Variation of dijkstra’s single source shortest path algorithm. • Given k headers and a pool of clusters. • Find the best cluster match for each of the headers. • From the matches found in previous step take the best one and assign it to the header. • Remove the chosen cluster from the available clusters pool. • Find again the best match for found cluster and repeat the step3 until all files recovered.
Shortest Path First • This algorithm presents the idea that best recoveries have lowest average path costs. • The average path cost is simply the sum of the weights between the clusters of a recovered file divided by the number of clusters. • Takes one image at a time. • Reconstruct the image. • After reconstruction the clusters used are not removed from the cluster pool. • This process is repeated for all the images. • Out of all the recovered images the one with lowest path cost is assumed as the best recovery. • Clusters associated with the best recovery are than removed.
Shortest Path First • This algorithm presents the idea that best recoveries have lowest average path costs. • The average path cost is simply the sum of the weights between the clusters of a recovered file divided by the number of clusters. • Takes one image at a time. • Reconstruct the image. • After reconstruction the clusters used are not removed from the cluster pool. • This process is repeated for all the images. • Out of all the recovered images the one with lowest path cost is assumed as the best recovery. • Clusters associated with the best recovery are than removed.
Results • Shortest Path First provides an accuracy of 88% • PUP provides an accuracy of 83% but is faster. • Both require edge weights to be pre computed. • For large hard drives requirement of forming weights by checking the likelihood between clusters is a major drawback.
BiFragment Gap Carving • Most of the real world data is bi-fragmented. • This technique works for files with known header and footer. • Files should be decodable or be validated via their structure. • Works by searching for combinations between identified header and footer.
Smart Carver • Can work on fragmented and non fragmented data. • Wide variety of file types supported. • Preprocessing • Data clusters are decrypted or decompressed. • Collating • Classification of cluster to various file types. • Reassembly
Smart Carver (PreProcessing) • Compressed and encrypted drive are decrypted/decompressed in this stage. • Removing known clusters from the disk based on file system met-data. • Helps increase the speed and reduce the amount of data for next phases. • Allocated files and Operating system specific data can be pruned since it doesn’t have any use in forensics.
Smart Carver (Collating) • Classifies the disk clusters as belonging to certain file types. • Reduces the cluster pool in recovery of file of each type. • Keyword/Pattern Matching • Looking for sequences to determine the type of cluster. • E.g. <html> tags in a cluster collates to html file. • ASCII characters frequency • High frequency of these indicate that data is non Video or Image.
Smart Carver (Collating) • File Fingerprints • Uses Byte Frequency Distribution (BFD) to determine the type of file. • BFD is generated by creating a histogram for the file. • A centroid model for each file type is created using the mean and standard deviation of each byte value. • Still they face problem differentiating JPEG and ZIP • Still a hot research topic.
Smart Carver (ReAssembly) • Reassembly can done by • Finding the starting fragment of a file that contains the header. • Merging clusters belonging to same fragment. • Finding the fragmentation point i.e. the last cluster in current segment. • Starting point of next fragment. • Ending point of last fragment. Last cluster contating the footer.
Smart Carver (ReAssembly) • Merging of similar Clusters can be done in two ways. • KeyWord/Dictionary • This occurs when a word is formed between the two cluster boundaries. • E.g. One cluster ends at “he”, second starting at “llo World”. Both can be merged. • File Structure • File structure can help in merging. Length field in headers indicate the length of data. E.g. in PNG file if length value is k than after k clusters CRC of data associated is present. If the data in between has same CRC than we can merger all clusters in between. Otherwise fragmentation is present.
Smart Carver (ReAssembly) • Sequential Hypothesis Parallel Unique Path Algorithm( SHT-PUP) for reassembly. • Modification of PUP algorithm. • In PUP when best match is found for the available k headers and out of them the best one is selected. • The clusters immediately following the newly found clusters are tested using sequential hypothesis testing until a fragmentation point is reached.
Smart Carver (ReAssembly) • Sequential Hypothesis Testing. • This is done by using the weight vector. i.e. the weights of all clusters in the pool. • Two Hypothesis are tested. • One that says the clusters belong in sequence to fragment • Other says that they don’t. • The ratio • is used to test the hypothesis.
Conclusion • Various File Carving methods for fragmented files are presented in the survey. • Problem of finding best weight is still an open research issue.