150 likes | 251 Views
peHash : A novel approach to Fast Malware Clustering. By : Georg Wicherski Presenting: Rasika Bindoo. Introduction. Data collection not a problem anymore because of honeypots. Honeypots suffer from a drawback of polluting malware databases. Anti-Viruses are slow.
E N D
peHash: A novel approach to Fast Malware Clustering By: Georg Wicherski Presenting: RasikaBindoo
Introduction • Data collection not a problem anymore because of honeypots. • Honeypots suffer from a drawback of polluting malware databases. • Anti-Viruses are slow. • Thus development of peHash for clustering group instances of the same polymorphic instances.
Other Attempts at Hashing • Spamsum, mrshash • n-grams Signatures • Vx-Class
peHash Function Design The function should have the following design characteristics • It should not have the need to look into the contents of the sections. • Low computational complexity. • Scaling the result of the bzip2 compression ratio to [0…7] С N leads to best matches.
Structural properties • The polymorphic malware share the same structural Portable Executable properties. • Thus following properties are taken into account for distinction between binaries : • Image characteristics. • Subsystem. • Stack commit size. • Heap commit size.
Structural properties • Structural information used for each section in the Portable Executable. • Virtual address • Raw size • Section Characteristics
Generation of hash values hash[0] := characteristics[0…7] Vcharacteristics[8…15] hash[1] := subsystem[0…7] Vsubsystem [8…15] hash[2] := stackcommit[0…7] Vstackcommit[8…15] Vstackcommit[24…31] hash[3] := heapcommit[0…7] Vheapcommit [8…15] Vheapcommit[24…31] ‘V’ symbolizes XOR operation
Generation of hash values • Sub-hash shash[0] := virtaddress-9…31] shash[2] := rawsize[8…31] shash[4] := characteristics[16…23] Vcharacteristics[24…31] shash[5] := kolmogorovϵ [0…7] С N
Advantages of this hash function • Complexity is O(1). • SHA1 of the hash buffer is calculated to obtain the final hash value. • Thus difficult to create collisions. • Constant length hashes are generated in spite of variable number of sections in the executables.
Entry Points and Imports • The value of entry point can be easily changed for each instance of polymorphic specimen. • Most packers specify misleading Import Address Tables. • The import information can also be easily changed without any noteworthy efforts and hence not included in the hash function. • Thus both entry point information and imports are not included in hash function.
Evaluation • peHash helps in clustering of polymorphic malware and also helps in detecting broken copies of already known threats.
Evaluation • Files in broken cluster share same size. • Differentiation can be done only by looking at actual code or imports. • Hence not possible for peHash.
Performance • Analysis to be carried out for one sample per peHash cluster. • Performance is not related to binary size or section count.
Conclusion • peHash provides a performant solution to the problem of seemingly new malware samples. • peHash can accomplish correct clustering for large sets by using basic information from Portable Executables. • peHash cannot be used to cluster variants of malware families for which code structure has to be analyzed.