380 likes | 494 Views
Linking Syntactic and Semantic Models of Java Source Code within a Program Transformation System. V. Winter, J. Guerrero, A. James, C. Reinke. Outline. Introduction Motivation: The need for static analysis Why transformation systems are interesting in this setting Creating a rule in PMD
E N D
Linking Syntactic and Semantic Models of Java Source Code within a Program Transformation System V. Winter, J. Guerrero, A. James, C. Reinke
Outline • Introduction • Motivation: The need for static analysis • Why transformation systems are interesting in this setting • Creating a rule in PMD • Creating a rule in Sextant • GPS-Traverse • Overview • Example: Constructing a call-graph • Technical details of GPS-Traverse
Source-code Analysis • Is heavily employed across the public and private sectors including: • the top 5 commercial banks • 5 of the top 7 computer software companies • 3 of the top 5 commercial aerospace and defense industry leaders • the 3 largest arms services for the US • 3 of the leading 4 accounting firms • 2 of the top 3 insurance companies
Source-Code Analysis • It has been argued that source-code analysis can play an important role with respect to software assurance within an Agile development process • The FDA is recommending (and may eventually mandate) the use of static-analysis tools for the development of medical device software. • GrammaTech’sCodeSonar is a static-analysis tool that the FDA is currently using to investigate failures in recalled medical devices.
Static-Analysis Tools • Are frequently rule-based • Utilize a variety of software models (e.g AST, call-graph, control-flow graph) • In an OO implementation, involve traversals of object-structures using the visitor pattern. • Make use of pattern recognition (e.g., matching). • May transform source-code (e.g., inserting markers/annotations to control analysis) • Query software models • Aggregate information
Avoid using while-loops without curly braces Creating a rule in PMD
Creating A rule in PMD • Step 1: Figure out what to look for. In this case we want to capture the convention that while-loops must use braces. • Construct a compilation unit containing an instance of the syntactic property you want to detect.
AST Generation • PMD uses JavaCC to generate an AST (Abstract Syntax Tree) corresponding to the source code. CompilationUnit TypeDeclaration ClassDeclaration:(package private) UnmodifiedClassDeclaration(Example) ClassBody ClassBodyDeclaration MethodDeclaration:(package private) ResultType MethodDeclarator(bar) FormalParameters Block BlockStatement Statement WhileStatement Expression PrimaryExpression PrimaryPrefix Name:baz Statement StatementExpression:null PrimaryExpression PrimaryPrefix Name:buz.doSomething PrimarySuffix Arguments
Pattern Selection • Select and generalize the smallest portion of the AST containing the pattern in which you are interested. Make sure you discriminate good patterns from bad patterns (e.g., blocks versus no blocks). Consult Java grammar as needed. CompilationUnit TypeDeclaration ClassDeclaration:(package private) UnmodifiedClassDeclaration(Example) ClassBody ClassBodyDeclaration MethodDeclaration:(package private) ResultType MethodDeclarator(bar) FormalParameters Block BlockStatement Statement WhileStatement Expression PrimaryExpression PrimaryPrefix Name:baz Statement StatementExpression:null PrimaryExpression PrimaryPrefix Name:buz.doSomething PrimarySuffix Arguments
Add Rule TO RULESET • Add the Newly Created Rule to the PMD ruleset <?xml version="1.0"?> <ruleset name="My custom rules" xmlns="http://pmd.sf.net/ruleset/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://pmd.sf.net/ruleset/1.0.0 http://pmd.sf.net/ruleset_xml_schema.xsd" xsi:noNamespaceSchemaLocation="http://pmd.sf.net/ruleset_xml_schema.xsd"> <rule name="WhileLoopsMustUseBracesRule" message="Avoid using 'while' statements without curly braces" class="WhileLoopsMustUseBracesRule"> <description> Avoid using 'while' statements without using curly braces </description> <priority>3</priority> <example> <![CDATA[ public void doSomething() { while (true) x++; } ]]> </example> </rule> </ruleset>
Avoid using while-loops without curly braces In SextanT
Create BASIC RuLE Pattern strategyWhileLoopsMustUseBracesRule: Statement[:] while( <Expression>_1 ) <Statement>_1 [:] Statement[:] while( <Expression>_1 ) <Statement>_1 [:]
Add Specific Pattern Constraint strategyWhileLoopsMustUseBracesRule: Statement[:] while( <Expression>_1 ) <Statement>_1 [:] Statement[:] while( <Expression>_1 ) <Statement>_1 [:] if { not(<Statement>_1 = Statement[:] <Block>_1 [:]) }
Add METRIC/Action strategyWhileLoopsMustUseBracesRule: Statement[:] while( <Expression>_1 ) <Statement>_1 [:] Statement[:] while( <Expression>_1 ) <Statement>_1 [:] if { not(<Statement>_1 = Statement[:] <Block>_1 [:]) andalsosml.addViolation(<Statement>_1) }
Observations • Primitive operations in transformation systems include: • Parsing • Matching • Traversal • The software models that transformation systems typically operate on are terms – either concrete or abstract syntax trees. • This makes the foundational framework of transformation systems well-suited for rule-based source-code analysis systems. Especially systems whose rules have syntax-based specifications.
Use equals() instead of == to compare objects Semantic Rules
Java’s Integer Cache • Some rules require semantic analysis • The implementation of such rules requires the ability to query semantic models (i.e., software models other than an AST)
Linking Syntactic and Semantic Models within a Transformation System GPS-Traverse
GPS-Traverse • GPS-Traverse • enables contextual information to be transparently tracked during transformation. • is a collection of transformations whose purpose is to associate terms with the contexts in which they are defined • This association is based on: • Structural properties • Nested classes • Local classes • Anonymous classes • Frame variables currently in scope • Generic variables currently in scope
In Summary… • GPS-Traverse: term context • In turn, a tuple of the form (term, context) provides the basis for a variety of semantic analysis functions • A particularly useful such analysis function is called resolution
Resolution • Resolution is a semantic analysis function that operates on terms denoting references • The resolution function used by Java is highly complex and involves: • Static evaluation • Type analysis • Overloading, overriding, shadowing • Generic analysis • Local analysis • Visibility – public, protected, package private, private • Subtyping • Imports: single-type, on-demand, and static
Uses of Resolution • Resolution is a prerequisite for a variety of software-based analysis and manipulation activities such as: • Bootstrapping semantic models • Software metrics • API usage analysis • Refactoring • Slicing • Migration – a well-formed compliment of slicing • Join point recognition • Resolution-informed transformation is well-suited for many of these activities • And finally, resolution-informed transformation can also play a key role in the construction of semantic models of software such as the call graph of a software system
Bascinet, the TL System, and Sextant Technical Details
Bascinet • A Netbeans-based IDE supporting the development of TL programs • Syntax-directed editors for TL, ML, and EBNF files • Code-foldingfor both TL and ML • Hyperlinks from MLton compiler output to ML source code • Integrated with third-party visualization tools such as Cytoscape , GraphViz, and TreeMap • Solves some key system-level problems: • Discrete concurrent (forgetful) applicationof a transformation to a file hierarchy { transformation } x {file1, file2, …} • Continuous sequential (stateful) applicationof a transformation to a file hierarchy state1 = transformation( state0, file1) state2= transformation( state1, file2)
The TL System • Input: GLR Parser • Output: Abstract Prettyprinter • TL – A language for specifying higher-order transformation • First-order matching on concrete syntax trees • First-order and higher-order generic traversals • Standard combinators plus special-purpose combinators • Modular • Partially type-checked • ML – A functional programming language tightly integrated with TL • Computation is expressed in terms of modules written in TL and ML.
TL • The terms being manipulated are concrete syntax trees • The computational unit is the conditional rewrite rule: termlhs termrhs if { condition } • Rules (also called strategies) can be bound to identifiers: r: termlhs termrhs if { condition } • Strategies can be constructed by composing rules using a variety of combinators: r1 <+ r2 r1 <; r2 • Strategies can be applied to terms using traversals and iterators: TDL myStrategymyTerm
import_closedGPS.Locator moduleCyclomaticComplexity strategyinitialize: ... strategyoutputResults: ... strategycollectMetrics: TDL( GPS.Locator.enter <; ccAnalysis <; GPS.Locator.exit ) strategyccAnalysis: MethodCC<+ConstructorCC strategyMethodCC: ... strategyConstructorCC: ... end// module
GPS-Traverse • Transformationally maintains a semantic model which can be queried in a variety of ways: • getContextKey • getEnclosingContextKey • currentContextType • enclosingContextType • withinContextType • inMethod • isGeneric • isLocalGeneric • isVar
strategyCallGraph: <SelectorOptExpression>_methodCall <SelectorOptExpression>_methodCall if{ isMethodCall<SelectorOptExpression>_methodCall andalsosml.GPS_inMethod() andalso<key>_methodContext = sml.GPS_getContextKey() // semantic query andalso<key>_calledMethod = sml.resolve( <key>_methodContext ,<SelectorOptExpression>_methodCall) andalsosml.outputPP( <key>_methodContext ) andalsosml.output(" calls ") andalsosml.outputPP( <key>_calledMethod) } strategyisMethodCall: //basic call SelectorOptExpression[:] <TypeArgsOpt>_1 <Id>_1 <Arguments>_1 [:] SelectorOptExpression[:] <TypeArgsOpt>_1 <Id>_1 <Arguments>_1 [:] <+ // embedded call ...
The End Questions?