420 likes | 562 Views
Attack of the Clones: Detecting Cloned Applications on Android Markets. Jonathan Crussell 1,2 , Clint Gibler 1 , and Hao Chen 1 1 University of California, Davis 2 Sandia National Labs Source: ESORICS 2012. Outline. Introduction Background Threat Model
E N D
Attack of the Clones: Detecting Cloned Applications on Android Markets Jonathan Crussell1,2, Clint Gibler1, and Hao Chen1 1 University of California, Davis 2 Sandia National Labs Source: ESORICS 2012
Outline • Introduction • Background • Threat Model • Clone Detection Approaches and Related Work • Methodology • Evaluation • Case Studies • Discussion • Conclusion
Introduction • Much of the user experience of Android relies on third-party apps. • Android has numerous marketplaces. • Protect users from malicious apps. • Protect developers from plagiarists.
Introduction • Developers can charge directly for their apps. • Offer free apps that are ad-supported or contain in-game billing. • Some apps have two version. • Paid app -> cracked &release for free • Free app -> cloned & change ad libraries
Background • Android Markets • Android Application Structure
Threat Model-Definition of “Clone”. • Clones occur when two applications have similar code but have different ownership. • IgnoreThird-party librariesMultiple versions of the same application if they have the same ownership.
Resistance to Evasion Techniques. • High level modifications • Method Restructurings • Control Flow Alterations • Addition/Deletion • Reordering
Non Goals • Find cloning in native code. • Determine which applications are the victims and which are clones.
Clone Detection Approaches-Feature Based • Feature based approaches analyze a program and extract a set of features. • Number or size of classes, methods, loops, or variables to included libraries. • Low detection rate or high false positive rate.
Clone Detection Approaches-Structure Based • Structure based systems convert programs into a stream of tokens and then compare the streams between two programs. • More robustly than feature based systems. • JPLAG, Winnowing and MOSS. • Comparing DEX byte code streams could be a quite quick and scalable method to find exactly or near exactly copied code. • But byte code streams contain no higher level semantic knowledge about the code.
Clone Detection Approaches-PDG Based • Program Dependence Graph: each node is a statement each edge shows a dependency between statements two types of dependencies: data and control • A data dependency edge between statements and exists if there is a variable in whose value depends on . • A control dependency between two statements exists if the truth value of the first statement controls whether the second statement executes.
Related Work • Androguard, DEXCD and DroidMOSS. • All these approaches are structure based or structure based approximations. • None of these tools use any semantic information to aid in detecting plagiarism.
Selecting Potentially Cloned Applications • The goal of an application plagiarist is to entice unwary users to choose her cloned application instead of the original. • Name and description.
Determining Application Similarity Based on Attributes • We use Solrto mimic the search engines on Android markets. • Attributes of the apps: name, package, market, owner, and description
Constructing PDGs • dex2jar: Convert both apps’ code from the DEX format to a JAR. • WALA: Construct PDGs for each method in every class of the applications. • Only data dependency edges: More robust against statement reordering, insertion and deletion.
Comparing PDGs-Excluding Common Libraries • Ad library Admob, Facebook API, etc. • Dumped both the package name and SHA-1 hash of known library files and recorded the most frequent SHA-1 hashes for each library.
Lossless and Lossy Filters • Lossless filter: Removes PDGs from consideration that are smaller than a specified size (< 10 nodes). • Lossy filter: Calculate a frequency vector for each of the methods in the pair. • This vector counts how many times a specific node type occurs in the PDG. • Compare these two vectors using hypothesis testing (G-test).
Subgraph Isomorphism • Find a mapping between nodes in and nodes in . • Subgraphisomorphism is NPComplete. • VF2 algorithm.
Computing Similarity Scores • For each method (excluding the methods in known libraries) in application , let be the number of nodes in this method’s PDG. Find the best match of this PDG in ’s PDGs and denote it as . • Similarity score:
Evaluation • 75,000 free apps from 13 Android markets. • Randomly selected 9,400 pairs from the potential clones. • Hadoop: parallelize DNADroid. • HDFS: share data across a small cluster. • The average throughput of DNADroid on this small cluster is 0.71 application pairs per minute.
“Benign” Cloning • DNADroid found 30 pairs that both have a 100% similarity score. • Translation.
Changes to Advertising Libraries • We can see when an application has most likely been cloned for monetary gain. • Ex: XWind Downloader • For the 141 apps, we found that 91 (65%) of these pairs had different libraries, all of which included changes to advertising libraries.
Malware Added to an Application • “HippoSMS” is a malicious application requires 10 permissions. • It shares the same package name as a Chinese video player requires 11 permissions. • 6 permissions that video playerdoesn’t use.
Two Variants of the Same Malware • Two malicious apps that are identified by VirusTotalas being variants of the “BaseBridge” malware family. • Both applications have been stripped of meaningful class and method names. • DNADroid found coverages of 35% and 28% between the two variants.
Use of Freeware Cracking Tool in the Wild • AntiLVLDecompiling an app with baksmaliInserts a new file:SmaliHook.classAnd hide AntiLVL’s modifications from the app itself by returning the original file size, MD5, and signatures. • Android License Verification Library (LVL), Amazon Appstore DRM and Verizon DRM. • 189 of 310 applications containing SmaliHook.class • 235 of 310 containing references to AntiLVL in their signature files. • Only 8% of our total apps were acquired from Chinese markets, 88% of the apps including AntiLVL traces were from Chinese markets.
False Positive • Since it is a serious allegation to claim an application is a clone, we design DNADroid to have a very low false positive rate.
False Negative • Cloned applications often have similar attributes as the original. (?) • There exist advanced program transformations that can evade PDG-based clone detection.
Comparison to Other Approaches • Androguard: miss 18% • DEXCDhad problems running on the pairs DNADroid identified. • DroidMOSS is not currently publicly available.
Performance • DNADroid are more expensive but result in fewer false positives and false negatives.
Conclusion • DNADroid is a tool for finding clones on a large scale. • We evaluated DNADroidon applications crawled from 13 Android markets. Identified at least 141 apps that have been cloned An additional 310 apps that were cracked with AntiLVL • We describe five case studies • DNADroid has a very low false positive rate • DNADroid is an effective tool.