1 / 15

peHash : A novel approach to Fast Malware Clustering

peHash : A novel approach to Fast Malware Clustering. By : Georg Wicherski Presenting: Rasika Bindoo. Introduction. Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow.

Download Presentation

peHash : A novel approach to Fast Malware Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. peHash: A novel approach to Fast Malware Clustering By: Georg Wicherski Presenting: RasikaBindoo

  2. Introduction • Data collection not a problem anymore because of honeypots. • Honeypots suffer from a drawback of polluting malware databases. • Anti-Viruses are slow. • Thus development of peHash for clustering group instances of the same polymorphic instances.

  3. Other Attempts at Hashing • Spamsum, mrshash • n-grams Signatures • Vx-Class

  4. peHash Function Design The function should have the following design characteristics • It should not have the need to look into the contents of the sections. • Low computational complexity. • Scaling the result of the bzip2 compression ratio to [0…7] С N leads to best matches.

  5. Structural properties • The polymorphic malware share the same structural Portable Executable properties. • Thus following properties are taken into account for distinction between binaries : • Image characteristics. • Subsystem. • Stack commit size. • Heap commit size.

  6. Structural properties • Structural information used for each section in the Portable Executable. • Virtual address • Raw size • Section Characteristics

  7. Generation of hash values hash[0] := characteristics[0…7] Vcharacteristics[8…15] hash[1] := subsystem[0…7] Vsubsystem [8…15] hash[2] := stackcommit[0…7] Vstackcommit[8…15] Vstackcommit[24…31] hash[3] := heapcommit[0…7] Vheapcommit [8…15] Vheapcommit[24…31] ‘V’ symbolizes XOR operation

  8. Generation of hash values • Sub-hash shash[0] := virtaddress-9…31] shash[2] := rawsize[8…31] shash[4] := characteristics[16…23] Vcharacteristics[24…31] shash[5] := kolmogorovϵ [0…7] С N

  9. Advantages of this hash function • Complexity is O(1). • SHA1 of the hash buffer is calculated to obtain the final hash value. • Thus difficult to create collisions. • Constant length hashes are generated in spite of variable number of sections in the executables.

  10. Entry Points and Imports • The value of entry point can be easily changed for each instance of polymorphic specimen. • Most packers specify misleading Import Address Tables. • The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function. • Thus both entry point information and imports are not included in hash function.

  11. Evaluation • peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.

  12. Evaluation • Files in broken cluster share same size. • Differentiation can be done only by looking at actual code or imports. • Hence not possible for peHash.

  13. Performance • Analysis to be carried out for one sample per peHash cluster. • Performance is not related to binary size or section count.

  14. Conclusion • peHash provides a performant solution to the problem of seemingly new malware samples. • peHash can accomplish correct clustering for large sets by using basic information from Portable Executables. • peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.

  15. Thank You

More Related