240 likes | 353 Views
ESEC/FSE 2018, CCF-A. Presented by Yichi Zhang 2019-4-4. Experiment Report or Conference Paper. Idea: Incremental, but pragmatic Result outweighs Implementation: A Systematic Comparison of Existing Tools Output: Take-aways for personal research. Focus on Taint Analysis. Motivation.
E N D
ESEC/FSE 2018, CCF-A Presented by Yichi Zhang 2019-4-4
Experiment Report or Conference Paper • Idea: Incremental, but pragmatic • Result outweighs Implementation: A Systematic Comparison of Existing Tools Output: Take-aways for personal research
Focus on Taint Analysis
Motivation • “What tool is the optimal choice in which application context?” • Challenges: • 1. Congruence in Sink/Source Labelling • 2. Congruence in Output format
Motivation • “What tool is the optimal choice in which application context?” • Challenges: • 1. Incongruence in Sink/Source Labelling • 2. Incongruence in Output format • 3. Imprecise “Ground truth”
Which source? Which Sink? What are internal nodes? Sometimes incorrect Figure 2: Imprecise “Ground Truth” in DroidBench and ICC-Bench
Main Contribution • 1. Anroid App Analysis Query Language (AQL) QUESTION Figure 3: Get all flows in one apk Figure 4: Yes or No Question. Whether there exists a flow
Main Contribution • 1. Anroid App Analysis Query Language (AQL) ANSWER Figure 5: Answer to “FLows IN ... ?”
Main Contribution • 2. AQL System: (From a user perspective) • Input: AQL Question • Output: AQL Answer • Procedure: • 1. Configuring a Tool with an analysis target and runtime parameters • 2. Run the tool • 3. Turn the output from the tool into AQL Answer Figure 6: AQL system
Main Contribution • 3. Benchmark Refinement and Execution Wizard (BREW) (From a user perspective) • Input: .apk file • Output: Ground Truth, i.e. the exactly data leak and the number of leaks • Procedure: • 1. Case Identification • 2. Source/Sink Labeling (Susi, machine learning based) • 3. Automatically Preselect flows, manually deselect by user • 4. Generate Ground truth in AQL Answer • 5. Compare the result from an analysis tool with the ground truth
Main Contribution • 4. Ground truths e.g. 21 newly developed Apps with where 18 apps providing 18 positive benchmark cases, and 6 negative cases, 3 apps dedicated to ICC/IAC feature. & 22 precise positive benchmark cases on DialDroid which encompasses 30 large real-world apps.
Analysis Tools in the study Figure 7: Tools involved in the Study
Result: 1. Do Android App analysis tool keep their promises*? Figure 8: Result of Supported Feature *Promises: Supported Feature and Accuracy
Result: 1. Do Android App analysis tool keep their promises? Figure 9(a): Result of F-score on Different benchmark suites
Result: 1. Do Android App analysis tool keep their promises? Figure 9(b): Result of F-score on Different benchmark suites
Result: 2. How do the tools copare to each other with respect to accuracy? Figure 10: Result of F-score in different features on DroidBench 3.0 On average, FlowDroid and Amandroid win.
Result: 3. Which tools support large-scale analyses of real-world apps? Figure 11: None can successfully finish all 30 apps. DIDFail and FlowDroid Win.
Result: 3. Which tools support large-scale analyses of real-world apps? API 26: Android 8.0 API 19: Android 4.4 Figure 12: Ability to analyze newer apps Because of tool dependency on ApkTool (decompiler), and ApkCombiner (for IAC feature)
Limitation • 1. Using default configuration of analysis tools. • Implication: Before taking away the ground truth, check if this tool support additional parameter to get a better result. • 2. Bugs in AQL system. Because of imprecise format of tool's output, the translation over-approximates. • Implication: Overriding methods with different parameters may be treated as the same.
Discussion Ideas: 1. This tool facilitates the analysis with different analysis tools 2. The precisely defined ground truth can be used for further research 3. One can use BREW to generate new ground truth and comparing the performance between analysis tools, e.g. After altering the code, whether the analysis tool still finds the flow correctly. Bounced off Ideas: ........
Benchmark case • Component: An App, or combination of Apps • Positive case: the flow is expected to be detected • Negative case: the flow is not expected to be detected • Success: True positive and true negative • Failure: False positive and false negative