290 likes | 461 Views
Detecting Software Theft via System Call Based Birthmarks. Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009. OUTLINE. Introduction and Related Work System Call Based Birthmarks System Design and Implementation Evaluation Discussion and Conclusion.
E N D
Detecting Software Theft via System Call Based Birthmarks Xinran Wang, Yoon-Chan Jhi, Sencun Zhu, Peng Liu ACSAC 2009
OUTLINE • Introduction and Related Work • System Call Based Birthmarks • System Design and Implementation • Evaluation • Discussionand Conclusion
Software Theft (or plagiarism) • Reuse someone else’s code • Even only a small part of the original program • Obfuscation techniques • Different compilers • Different compiler optimization levels • SandMark
Defender • Software watermark • Theoretically, any watermark can be removed • Software birthmark • A unique characteristic that a program inherently possesses
Defender(Cont.) • Requirements • R1: Resiliency to obfuscation techniques • R2: Capability to detect theft of components • R3: Large-scale • R4: Applicability to binary executables • R5: Independence to platforms
Related Work • Software Birthmark • Static source code based birthmark • Static executable code based birthmark • Dynamic whole program path(WPP) based birthmark • Dynamic API based birthmark • Clone Detection • String-based, AST-based, Token-based and PDG-based • Cannot satisfy all requirements
System Call Based Birthmarks • Behavior based birthmarks • Unique behaviors in features and implementation details • SCSSB (System Call Short Sequence Birthmark) • IDSCSB (Input Dependant System Call Subsequence Birthmark)
SCSSB (System Call Short Sequence Birthmark) • Definition 1: (System Call Trace) • Definition 2: (System Call Sequence Set)
SCSSB (System Call Short Sequence Birthmark) • Definition 3: (SCSSB: System Call Short Sequence Birthmark) SCSSB(p, I, k) is a subset of set S(p, I, k) that satisfies
SCSSB (System Call Short Sequence Birthmark) • Definition 4: (Containment) The containment of A in B is defined as: Here A is the birthmark of a plaintiff program or its component, and B is the birthmark of a suspect program.
System Design and Implementation • System Call Tracer • System Call Abstraction • Birthmark Generator • Input Dependant System Call Subsequence Birthmarks
System Call Tracer • The simplest way • strace • With thread identifier • SATracer based on Valgrind • Prepare a list of all subroutines of the component in SATracer • The list is automatically generated by Elsa • SATracer checks the execution stack of the running thread when a system call is called
System Call Abstraction • Ignore the system calls that do not represent the behavior characteristic • brk , mmap • Consider aliases or multiple versions of a system call as the same • Ex: fstat(int fd, struct stat *sb) and stat(const char *path, struct stat *sb) • Ignore failed system calls
Birthmark Generator • Remove those loading-environment-dependent system calls • Run multiple times with the same input • Remove the (noisy) system calls • Establish a database of common system call short sequences
Input Dependant System Call Subsequence Birthmarks • Definition 7: (IDSCSB: Input Dependant System Call Subsequence Birthmark) • Containment:
Input Dependant System Call Subsequence Birthmarks • “file id” and “process id” are ignored • Large parameters are hashed by the MD5
Evaluation • SCSSB and IDSCSB: • Against some advanced obfuscation techniques and 15 real-world large applications • SandMark implements 39 byte code obfuscators • x86 Linux executable • GCJ 4.1.2
Evaluation(Cont.) • Programs • bzip2.c, gzip.c and oggenc.c • Impact of Compiler Optimization Levels • five optimization switches (-O0,-O1,-O2,-O3 and -Os) of GCC (e.g., bzip2-O0, bzip2-O3, etc.) • Impact of Different Compilers • GCC, TCC and Watcom (e.g., bzip2-gcc, bzip2-tcc)
SCSSB Experiment I(Cont.) • JLex and JFlex
SCSSB Experiment I(Cont.) • Containment scores • JLex • CO: 87.9% • DO: 85.2% • JFlex • CO: 96% • DO: 96%
SCSSB Experiment II(Gecko) • Gecko: Layout engine used in all Mozilla software and its derivatives
IDSCSB Experiment I(JLex and JFlex) • The containment scores between original and obfuscated JLex are all 100% • Between JLex and obfuscated JFlex are less than 46% • Between JLex/JFlex and other programs are no more than 7%.
Discussion • Counterattacks • System call injection attack • System call reordering attack • Limitations • If the program does not involve any system calls… • Need unique system call behaviors • The detection result of our tool depends on the threshold a user defines
Conclusion • A novel type of birthmarks • Resilient to discriminates code obfuscated by SandMark, a state-of-the-art obfuscator • The first birthmark that: • Detect software component theft • Scalability to detect large-scale software theft