320 likes | 514 Views
Improving Static Analysis Performance Using Rule-Filtering Technique. D. Chen and R. Huang, etc School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China chendeng8899@hust.edu.cn. Abstract.
E N D
Improving Static Analysis Performance Using Rule-Filtering Technique D. Chen and R. Huang, etc School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China chendeng8899@hust.edu.cn
Abstract • Static analysis is an efficient approach for software assurance. It is indicated that the most effective usage of it is to perform analysis in an interactive way through software development process, which has a high performance requirement. This paper concentrates on rule-based static analysis tools and proposes an optimized rule-checking algorithm to improve their performance. Our technique filters rules according to their characteristic objects before checking the rules against a specific source file. It is based on an observation that a source file always contains vulnerabilities of a small part of rules rather than all.
To investigate our technique’s feasibility and effectiveness, we implemented it in an open source static analysis tool called PMD and used it to perform an evaluation. The evaluation results show that our approach can get an average performance promotion of 28.7% compared with original PMD. Additionally, our technique incurs trivial runtime overhead.
Introduction • Static analysis is one of the most important techniques for software assurance. Integrating bug-finding tools into the development process is an important direction for future research, which has a high performance requirement.
Introduction—Existing techniques to improve performance of static analysis tools • To resolve the performance issue, many techniques have been proposed, such as incremental analysis technique and distributed technology. However, they have the following drawbacks: • -Incremental analysis: it may imply a complete reanalysis in the worst case; • -Distributed technology: building a robust distributed system is a complicated and complex task; • -Other techniques: they may have side effects on static analysis precision which is a key index of bug-finding tools.
Introduction —Working principle of rule-based static analysis tools • In this paper, we aim to improve the performance of rule-based static analysis tools. A rule-based static analysis tool detects vulnerabilities by matching vulnerability rules lexically or syntactically against program source code or other artifacts. Its architecture is illustrated in the right figure.
The rule-checking algorithm used by the rules checking engine is a decisive factor of the tools’ performance especially for a large vulnerability rules library. We improve the performance of rule-based static analysis tools by providing an optimized rule-checking algorithm.
Introduction—Optimized rule-checking algorithm • To date, most of the rule-checking algorithms are based on visitor pattern, that is, each rule checks against the whole AST using depth-first or breadth-first tree searching approach. The problem is that, a large number of unnecessary AST nodes (which are impossible to contain violations of rules) may be visited, which increases time overhead enormously.
We improve the rule-checking algorithm in the following manner: given a source file f required to be validated, prior to checking vulnerability rules against the file, our technique first filters out unnecessary rules according to characteristic objects. It should be effective in improving performance, because a source file always contains violations of a small part of rules in the library rather than all. Additionally, little runtime overhead is incurred.
Introduction —Our contribution • A novel rule-filtering technique that can improve the performance of rule-based static analysis tools. • A formal language which is used to describe characteristic objects of rules. • A prototype tool EPMD that implements our technique. • An empirical evaluation that shows the feasibility and effectiveness of our technique.
Our Technique—Characteristic Objects • Our technique filters vulnerabilities rules in terms of their Characteristic Objects. • We call all classes that a rule relies on Characteristic Objects. Take the rule shown on the right for example, it is described using XPath expressions. Its characteristic objects are the following classes: MessageDrivenBean and SessionBean
Our technique—CObject Expression • It must be noted that characteristic objects MessageDrivenBean and SessionBean have a disjunction relationship. In order to describe characteristic objects as well as relationships among them, we propose a prefix notation CObject Expression.
A CObject expression is comprised of full package names of characteristic objects (full package names facilitate evaluation of the expressions), commas, parentheses and operator symbols. Recursive definition of CObject expression is presented as follows. Given full package name t of a characteristic object and two CObject expressions R and S, we have the following CObject expressions:
-exist(t): denotes existence operation that determines whether the characteristic object t has been imported into a source file. • -neg(R): denotes negation operation that reverses the meaning of a Boolean operand. • -and(R, S): denotes conjunction operation that performs a logical conjunction on two Boolean operands. • -or(R, S): denotes disjunction operation that performs a logical disjunction on two Boolean operands.
CObject expression of the rule illustrated above is as follows: • or(exist(javax.ejb.SessionBean), exist(javax.ejb.MessageDrivenBean))
Our Technique—Evaluating CObject Expression • Before checking vulnerability rules against a source file f, our technique first filters the rules according to evaluation results of their CObject expressions. For each rule r, if the evaluation result is Boolean true, we check r against f. Otherwise, we discard the rule. In this subsection, we discuss our approach of evaluating CObject expressions.
Since CObject expression is a kind of prefix notation, it can be evaluated using the traditional stack-based approach. In detail, given a CObject expression e, we scan e from right to left. For each encountered token η, we perform the following actions according to its type: • -If η is an operand, we push η into a stack T. • -If η is an operator, we perform corresponding operations on operands popped from the top of T and push the results back into T. • Once all the symbols have been processed, a Boolean value will be left in T, which is the evaluation result of e.
The method that we perform exist operation is linear search algorithm. In detail, given a full package name λ of a characteristic object, I is a set of import statements at the beginning of a source file f. To distinguish whether λ has been imported into f (evaluating exist(λ) with f), we search λ in I sequentially.
Our Technique——An evaluation example • As an example, we evaluate CObject expression or(exist(O1), exist(O2))with an input program ρ shown above, where O1 and O2 denote classes javax.ejb.SessionBean and javax.ejb.MessageDrivenBean respectively.
The evaluation process is illustrated on the right, where T and F denote Boolean true and false respectively. From the table, we find that the evaluation result F is consistent with our manual inspection and corresponding rules should be pruned away.
Evaluation • To evaluate our technique, we implemented it in a prototype tool called EPMD, which developed from the open source static analysis tool PMD, and investigated the performance improvement of our technique.
Evaluation—EPMD • We implemented our technique in an open source static analysis tool PMD and called the extended version EPMD. The primary modifications that we did to PMD are as follows: • -Implementation of a module that evaluates CObject expressions. • -Addition of CObject expressions to the vulnerability rules of PMD.
In order to evaluate CObject expressions, we constructed a lexer and parser with the help of ANTLR to recognize tokens and validate syntax of CObject expressions respectively.
Evaluation—Subjects • We adopt six large-scale open source applications in our evaluation, which are listed in right table.
Evaluation—Performance evaluation with EPMD • To investigate effectiveness of our technique, we analyzed subject applications using EPMD with rule sets RS and RS* respectively. Rule set RS consists of 45 rules excerpted from PMD, and RS* contains all the same elements as RS except that, each rule of which has an additional property of CObject expression. In accordance with our technique, rules will be filtered when running EPMD with rule set RS* but not with RS.
We repeatedly ran EPMD five times for each group of subject and rule set. After that, we compared the average execution time of EPMD running with rule set RS and RS*. • Experiment setup:an Intel(R) Core (TM) i3-2100 3.1GHz machine with 3GB of memory running Windows XP.
From the experimental results, we find that the execution time with rule set RS* is generally less than that of RS for all applications. The least ratio of reduction is 15.6% when testing on Eclipse. The ultimate is 45.3% with PMD. We have an average ratio of time reduction 28.7%. A side effect that may be caused by our technique is the increase of false positives and false negatives. We compared the bug reports generated in each group and found that the bugs reported by EPMD running with rule set RS and RS* were the same (in terms of the number and content).
Conclusions and Future Work • In this paper, we proposed a novel rule-filtering algorithm based on characteristic objects. Additionally, in order to describe characteristic objects, we designed a prefix notation CObject expression. By evaluating CObject expressions using the stack-based approach, our technique can determine whether a rule should be checked against a specific source file or not.
We also presented EPMD, a prototype tool developed from PMD that implements our technique. In the evaluation, we performed five groups of test using EPMD on subjects of six real-world applications. The experimental results strongly suggest that our technique is effective in improving static analysis performance.
Though we discussed our technique based on Java language, as a matter of fact, our technique is applicable for programs written in most of mainstream programming languages, such as C/C++ and C#. What is noteworthy is that, not all languages have import statements as Java. It is header file instead in C/C++. On the other hand, the effectiveness of our technique may be varied for different languages. For instance, the limitations (1) and (2) discussed in Section III will not exist in C/C++ programs, which makes our technique become more useful. Applications of our technique to other programming languages and a further investigation are left to the future work.