1 / 36

Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks

Detect software plagiarism in multithreaded programs using dynamic birthmarks optimized for thread scheduling nuances. Explore a novel approach for effective plagiarism detection.

rclegg
Download Presentation

Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Plagiarism Detection for Multithreaded Software Based on Thread-Aware Software Birthmarks Zhenzhou Tian zztian@stu.xjtu.edu.cn MOE Key Lab for Intelligent Networks and Network Security Xi’an Jiaotong University, China 2020/1/4

  2. Outline • Introduction • Thread-Aware Birthmark Methods • Evaluation • Unsolved Problems & Future Work

  3. Introduction • Software plagiarism has been a serious threat to the healthy development of software industry • Violate licenses for commercial interests or unwittingly • Weak code protection awareness • Powerful automated code obfuscation tools • Distributed in binary form

  4. Introduction • A series of methods are proposed for plagiarism detection • Software Watermarking • Insert extra data • “a sufficiently determined attacker will eventually be able to defeat any watermark” • Static and Dynamic Software Birthmarks • Dynamic birthmarks are more resilient to semantic-preserving code obfusctions

  5. Introduction • A series of methods are proposed for plagiarism detection • Software Watermarking • Static and Dynamic Software Birthmarks • Increasingly popular trend towards multithreaded programming brings new challenge to existing dynamic birthmark methods • Existing dynamic birthmark remain optimized for sequential programs • Neglect the effect of thread scheduling • Two executions of a single program under same input can be very different, rendering the existing methods ineffective

  6. Introduction • DKISB: dynamic key instruction sequence birthmark • SCSSB: system call short sequence birthmark

  7. Introduction • Contributions: • Two thread-aware dynamic birthmarks TW-DKISB and TW-SCSSB are proposed to detect software plagiarism • Operates directly on binary executables • Not limited to specific operating systems and languages • Resilient to various automated obfuscation techniques 29 different obfuscation techniques in SandMark

  8. Introduction • Contributions: • A prototype is implemented using the Pin instrumentation framework, and extensive experiments are conducted. • A suite of benchmarks is compiled for researchers to conduct experiments and present their findings http://labs.xjtudlc.com/labs/benchmark.html

  9. Outline • Introduction • Thread-Aware Birthmark Methods • Evaluation • Unsolved Problems & Future Work

  10. Software Birthmark • A set of characteristics extracted from a program that reflects intrinsicpropertiesof the program, and which can be used to identify the program uniquely. • Two types: Static and Dynamic software birthmarks • Dynamic birthmark definedby Myles

  11. Thread-Aware Dynamic Software Birthmark • Predetermining a thread schedule is very difficult • Try to shield their influence on executions instead of enforcing thread schedule

  12. Thread-Aware Dynamic Software Birthmarks • Main Idea: Split then Aggregate • Execution order in each thread is relatively stable. • Projecting the trace on thread-ids to obtain sub-traces to extract Slice birthmarks • Aggregatingall slice birthmarks. Different traces of a program under the same input Same slices

  13. Slice Birthmark & Program Birthmark K-Gram SAM SSM Slice Birthmarks

  14. Thread-Aware Birthmark based Plagiarism Detection 5 main modules: • DAM: monitoring and recording • PP: constitute valid traces • BG: extract thread-aware birthmarks • BSC: calculate similarity scores • PD: determine detection result

  15. Thread-Aware Birthmark based Plagiarism Detection 5 main modules: • DAM: monitoring and recording • PP: constitute valid traces • BG: extract thread-aware birthmarks • BSC: calculate similarity scores • PD: determine detection result

  16. Dynamic Analysis Module • Monitoring the execution of a program using Pin • DKISExtractor: performs dynamic taint analysis to identify and record key instructions • SysTracer: record each execution of system calls

  17. Thread-Aware Birthmark based Plagiarism Detection 5 main modules: • DAM: monitoring and recording • PP: constitute valid traces • BG: extract thread-aware birthmarks • BSC: calculate similarity scores • PD: determine detection result

  18. Thread-Aware Birthmark based Plagiarism Detection 5 main modules: • DAM: monitoring and recording • PP: constitute valid traces • BG: extract thread-aware birthmarks • BSC: calculate similarity scores • PD: determine detection result

  19. Pre-Processor & Birthmark Generator • Pre-Processor: filter out noises and extract valid traces • Birthmark Generator: generate TW-DKISBs and TW-SCSSBs utilizing SA model and SS model implemented

  20. Thread-Aware Birthmark based Plagiarism Detection 5 main modules: • DAM: monitoring and recording • PP: constitute valid traces • BG: extract thread-aware birthmarks • BSC: calculate similarity scores • PD: determine detection result

  21. Thread-Aware Birthmark based Plagiarism Detection 5 main modules: • DAM: monitoring and recording • PP: constitute valid traces • BG: extract thread-aware birthmarks • BSC: calculate similarity scores • PD: determine detection result

  22. Similarity Calculator & Plagiarism Decider • Similarity Calculator Four Similarity Metrics

  23. Similarity Calculator & Plagiarism Decider • Similarity Calculator Bipartite matching

  24. Similarity Calculator & Plagiarism Decider • Similarity Calculator • Decision Maker

  25. Outline • Introduction • Thread-Aware Birthmark Methods • Evaluation • Unsolved Problems & Future Work

  26. Evaluation • A high quality birthmark manifests in that the ratio of false classifications should be rather low for a given ɛ • Two properties to check

  27. Evaluating Resilience Property • Resilience to different compilers and optimization levels Statistical differences for 20 versions of pigz Similairty scores between binaries of pigz

  28. Evaluating Resilience Property • Resilience to special obfuscation tools Cosine similarity between ConGzip and its 29 Sandmark obfuscated versions

  29. Evaluating Resilience Property • Resilience to special obfuscation tools • Allatori, DashO, Jshrink, ProGuard and RetroGround Resilience to Allatori-Series obfuscation tools

  30. Evaluating Credibility Property • Similarity between independently implemented programs • 6 compression software: Lbzip, lrzip, pbzip2, pigz, plzip and rar • 5 audio players: Cmus, mocp, mp3blaster, mplayer and sox • 10 web browsers: arora, chromium, dillo, dooble, epiphany, firefox, konqueror, luakit, midori and seaMonkey Credibility evaluation of TW-SCSSBs using 10 web browsers

  31. Comparing with Traditional Birthmarks • Performance Evaluation Metric • By varying ɛ from 0-0.5, an F-Measure curve can be drawn • AUC: area under the F-Measure curve Detection Criteria

  32. Comparing with Traditional Birthmarks F-Measure curves for TW-SCSSBSA, TW-SCSSBSS, and SCSSB

  33. Outline • Introduction • Thread-Aware Birthmark Methods • Evaluation • Unsolved Problems & Future Work

  34. Unsolved Problems & Future Work • Problems • Partial and library plagiarism problems • Tool is preliminary • Impact of K is not evaluated • Future Works • Conduct experiments using other kinds tools, such as the shelling tools (Upx, ASProtect etc.); and on real plagiarism cases • Improve our method to support for partial plagiarism detection • Evaluate the effect of K to detection ability • Form a relatively mature tool

  35. Q&A

  36. Some Definitions

More Related