1.35k likes | 1.52k Views
Source Code Analysis Using BAT. What is Static Analysis?. Mining source code for information. Using that information to present abstractions of, and answer questions about, software structure. What can we get from source code analysis?. Type of information is model dependent
E N D
Source Code Analysis Using BAT Reverse Engineering (Source Code Analysis)
What is Static Analysis? • Mining source code for information. • Using that information to present abstractions of, and answer questions about, software structure. Reverse Engineering (Source Code Analysis)
What can we get from source code analysis? • Type of information is model dependent • In almost any language, we can find out information about variable usage: Who? Where? etc. • In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc. • We can also find potential blocks of code that can never be executed in running the program (dead code). Reverse Engineering (Source Code Analysis)
BAT • Is a tool that lets us perform static analysis on Java programs (class files). • Builds an XML database of entities and relationships in a system. • Can use several tools for querying and visualizing the data. Reverse Engineering (Source Code Analysis)
Entities • ‘Entities’ are individuals that live in the system, and attributes associated with them. Some examples: • Classes, along with information about their superclass, their scope, and ‘where’ in the code they exists. • Methods/functions and what their return type or parameter list is, etc. • Variables and what their types are, and whether or not they are static, etc. Reverse Engineering (Source Code Analysis)
Relationships • ‘Relationships’ are interactions between the entities in the system. Relationships include: • Classes inheriting from one another. • Methods in one class calling the methods of another class, and methods within the same class calling one another. • One variable referencing another variable. Reverse Engineering (Source Code Analysis)
Creating BAT Databases • BAT is really a library that can process JAR files • BATAnalyzer is a small app wrapped around BAT to return a full XML database from BAT for later processing • Found at: BATROOT/analyzer/src • To run: export PATH=/usr/remote/serg/jdk1.5.0_11/bin/:$PATH java -Xmx2G -cp /usr/remote/serg/binbat2toxml.jar:/usr/remote/serg/bin/batanalyzer.jar batanalyzer.Main <JAR> <OUTPUT> Need to give Java a lot of Memory to process large projects Project to analyze Call to analyzer XML output file BAT API Reverse Engineering (Source Code Analysis)
Provided Tools to deal with BAT • bdef – A BASH wrapper around XSLT queries to get entity information • bref – A BASH wrapper around XSLT queries to get relationship information • dot – A visualization tool. Takes information from query and displays it as a graph. • On TUX to get the scripts do: export PATH=$PATH:/usr/remote/serg/bin/ Reverse Engineering (Source Code Analysis)
bdef Syntax • bdef takes information from the entities database based on a query, and returns the results in an ascii-table. bdef xml_file entity_kind entity_name [attr=val] • xml_file is the xml file containing the extracted database • entity_kind is the ‘type’ of entity to retrieve. • entity_name is a pattern to match for names of entities. • attr=val are bindings to match for attributes of the entity Reverse Engineering (Source Code Analysis)
Entity Kinds • Chava recognizes several types of entity ‘kinds’ for use in the bdef/bref commands. • m is for Method • c is for Class • f is for Field • - is a match for any entity_kind Reverse Engineering (Source Code Analysis)
Entity Names • An entity name can assume many forms following regEX patterns • Explicit name (e.g., ‘myTempStringVar’) • Wild-card Pattern (e.g., ‘myTemp.*’) • A complete wild-card, denoted with ‘.*’ Reverse Engineering (Source Code Analysis)
Attribute=Value • Attribute=Value settings are used to further restrict a query based on some condition specified as regEX. • Any field is searchable • The most common restriction is to restrict to a specific file, or to filter out a file. E.g., bdef file.xml - - filename=FileIDoLike.java bdef file.xml - - filename=[^(FileIDoNOTLike.java)] Reverse Engineering (Source Code Analysis)
Fields • Class • name, filename, scope, deprecated, final, abstract • Method • name, class, filename, scope, static, deprecated, final, abstract, varargs, bridge, native, synchronized, return, parameters • Field • Name, class, filename, type, scope, static, deprecated final, transient, volatile, enum Reverse Engineering (Source Code Analysis)
Example Query • Assume that we want to find all the methods in a specific file (in this case, World.java) that start with ‘get’. Our query would look like the following: bdef sim.xml m "get.*" filename="World\.java” World.java is a part of a Discrete Event Simulator that contains information about the simulation environment Reverse Engineering (Source Code Analysis)
Example Results (bdef) bdef sim.xml m "get.*" filename="World\.java" getWorldArray:World:World.java:public:false:false:false:false:false:false:false:false: getWorldString:World:World.java:public:false:false:false:false:false:false:false:false: getWorldString:World:World.java:public:false:false:false:false:false:false:false:false: getWorldMaskString:World:World.java:public:false:false:false:false:false:false:false:false: getEmpty:World:World.java:public:false:false:false:false:false:false:false:false: getWidth:World:World.java:public:false:false:false:false:false:false:false:false: getHeight:World:World.java:public:false:false:false:false:false:false:false:false: Reverse Engineering (Source Code Analysis)
Results Explained • The bdef query resulted in a collection of : separated lists. The data in the columns mean the following: • name is the name of the method • class is the class the method belongs too • filename the file containing this method • scope the scope of the method • static if the method is static • deprecated if the method is deprecated • final if the method is final • abstract if the method is abstract • varargs if the method uses variable arguments • bridge if the method is a bridge • native if the method is native • synchronized if the method is synchronized • return the method’s return type • parameters the types of parameters accepted Reverse Engineering (Source Code Analysis)
Exercise • This exercise uses some Unix utilities along with our use of bdef. The exercise involves two things: • Counting the number of methods of class World (in World.java). • Printing out a list of methods in the form of their name, return type, and parameter list. Reverse Engineering (Source Code Analysis)
Using Unix(Part One) • In order to count the number of lines of a document, one can use the command line tool wc. • The –l option makes it count lines. • Piping to it makes it count the lines of output from a program. {bdef query} | wc –l counts the number of lines in a bdef query. Reverse Engineering (Source Code Analysis)
The solution is … • The solution to the first problem is: bdef sim.xml m ".*" filename="World\.java" | wc -l Reverse Engineering (Source Code Analysis)
Using Unix(Part Two) • For the second question, we will again use the unformatted output of bdef. • This time, we’ll take note of the format of the unformatted output! We’ll keep this limited to the case of unformatted output for methods. • Each field of the unformatted output is delimited by a colon. The fields we care about are the name, return-type, and parameter-list fields. These are fields 1, 13, and 14, respectively. Reverse Engineering (Source Code Analysis)
Using Unix(Part Two) • The final piece in the puzzle of displaying the specific fields is getting the fields themselves out of the output. • The cut utility will do nicely. We can send it a delimiter, and a list of field numbers for a file, and it will return those fields for each line. • The delimiter flag for cut is –d. The field numbers delimiter is –f, followed by a series of comma separated numbers. Reverse Engineering (Source Code Analysis)
The solution is … • Our target query is thus: bdef sim.xml m ".*" class="World" | cut -d ":" -f 1,13,14 Reverse Engineering (Source Code Analysis)
Output for Exercise • Question One: 13 • Question Two: <init>::(int,int,) removeEntity::(Location,) moveEntity::(Location,Location,) addEntity::(Location,) checkBounds:boolean:(Location,) checkLocation:boolean:(Location,) getWorldArray:char[][]:() getWorldString:java.lang.String:(char[][],) getWorldString:java.lang.String:() getWorldMaskString:java.lang.String:(java.util.Vector,java.util.Vector,) setBox::(char[][],int,int,int,int,char,) getEmpty:char:() getWidth:int:() getHeight:int:() <clinit>::() • Not very pretty, but useful (we hope…). Reverse Engineering (Source Code Analysis)
bref • bref is a tool that displays relationship information by linking one entity to another Reverse Engineering (Source Code Analysis)
bref Syntax bref xml kind1 name1 kind2 name2 • kind1 and kind2 are entity kinds • name1 and name2 are entity names • xml the XML file containing the database Reverse Engineering (Source Code Analysis)
Example Query • Here’s a query to find all class-class relationships in the database. bref sim.xml c “.*” c “.*” Reverse Engineering (Source Code Analysis)
Example Results (bref) • bref sim.xml c “.*” c “.* "AutoCar" -> "Car” "AutoControl" -> "java.lang.Object" "Car" -> "Entity” "CarControlException" -> "java.lang.Exception" "CarCrashException" -> "java.lang.Exception" "CarMoveController" -> "Entity" "CarOutOfBounds" -> "java.lang.Exception" "CarParkTrafficGenerator" -> "Entity" ……………………… Reverse Engineering (Source Code Analysis)
Results Explained • bref returned a list of classes. • Each line represents a relationship between the entities • The entity on the right is the first entity asked for • The entity of the left is the second entity asked for Reverse Engineering (Source Code Analysis)
Exercise – bref • In these exercises, we’ll examine various relations between the entities of a system. • We’ll go over: • Inheritance relationships. • Method-Method relationships. • How to write a shell script using BAT tools Reverse Engineering (Source Code Analysis)
Exercise #1 • We’ve already seen how to find the entire inheritance tree from our example, so this exercise should be easy: • Find all the classes that Entity inherits from, and all the classes that subclass it. Reverse Engineering (Source Code Analysis)
Inheritance Relation • The relation between classes that we are interested in is subclassing. • But which entity in the relation subclasses the other? • The answer is that the first entity subclasses the second. Reverse Engineering (Source Code Analysis)
Inheritance Relation (Cont’d) • The answer to the question “which class is Entity a subclass of” is: bref sim.xml c “Entity”c “.*” • We can analogously find which classes subclass Entity : bref sim.xml c “.*” c “Entity “ Reverse Engineering (Source Code Analysis)
Exercise #2 • This exercise concentrates on method-to-method relations. • Our task is to find what the fan-in and fan-out of a function are. • We’ll use World.addEntity function in the example Reverse Engineering (Source Code Analysis)
Definition: Fan-In/Fan-Out • Fan-In • The fan-in of a function/method is the number of functions/methods that invoke that method. • Fan-Out • The fan-out of a function/method is the number of functions/methods that it invokes. Reverse Engineering (Source Code Analysis)
Finding Fan-In, Fan-Out • The fan-in of a method can be calculated thusly: bref sim.xml m ".*" m "World.addEntity" | wc -l • The fan-out of a method can be calculated analogously: bref sim.xml m "World.addEntity" m ".*" | wc -l Reverse Engineering (Source Code Analysis)
Exercise #3 • In this Exercise, we’ll write a shell script to determine if one class is an ancestor or a descendent of another. Reverse Engineering (Source Code Analysis)
Descendent Relation • A class X is an descendent of class Y if X subclasses Y, or X’s superclass is a descendent of Y. • This sets up a nice recursion, which will make our job easy. Reverse Engineering (Source Code Analysis)
Shell Scripting • Our first step is to come up with an exact specification of what we want: • Given two classes, D and A, our script should report a 1 if D is an descendent of A, and 0 otherwise. Reverse Engineering (Source Code Analysis)
Shell Scripting… • Our first coding step is to determine what shell to use. For this exercise, we’ll be using the C shell. • This makes our shebang line like: #!/bin/csh Reverse Engineering (Source Code Analysis)
Shell Scripting • To make this a little nicer to look at, we’ll make a few small helper-scripts… • One to return whether one class subclasses another. • One to return the ‘name’ field from unformatted BAT output. • One to return the names of all the classes that inherit from a given class. Reverse Engineering (Source Code Analysis)
Helper Script (does_subclass) • Our first script is pretty simple: #!/bin/csh @ z = `bref $1 c $2 c $3 | wc -l` != 0 echo ${z} Reverse Engineering (Source Code Analysis)
Helper Script (get_name) • Our get_name script only has to return the value of one field. We’ll just make a small script to do it. cut -d " " -f1 Reverse Engineering (Source Code Analysis)
Helper Script (subclasses) • A script to get all the subclasses is also relatively trivial: bref $1 c ".*" c $2 |get_name Reverse Engineering (Source Code Analysis)
The Actual Script (ancestor) • Since our relation is a recursive one, we have to start our code by taking care of the base case (which is that D is a subclass of A. Parent-Child relationship…). #!/bin/csh if (`bref $1 c $2 c $3 | wc -l ` != 0) then echo 1 exit endif Reverse Engineering (Source Code Analysis)
The Rest of the Script • The rest of the script deals with the recursion. We have to check every subclass to see if it is an ancestor of the target class. foreach child (`subclasses $1 $3`) if (`ancestor $1 $2 $child`) then echo 1 exit endif end Reverse Engineering (Source Code Analysis)
However… • There’s a better way to do this, which would be to traverse up from the descendent. • There can be multiple subclasses to any class. • In Java, there is only one superclass to a class. • We’ll call this the ancestor relation, defined as: • X is an ancestor of Y if X is Y’s superclass, • or X is an ancestor of Y’s superclass. • We’ll write two little helper scripts to do the rewrite. Reverse Engineering (Source Code Analysis)
Helper Scripts, II (other_name) • A script to get the name of the second entity of a relation could be useful. cut -d " " -f3 Reverse Engineering (Source Code Analysis)
Helcper Scripts, II (parent) • A second script, to return the parent of a class, if it exists, would be: #!/bin/csh bref $1 c $2 c ".*" | other_name Reverse Engineering (Source Code Analysis)
Making the Finished Product • First take care of the base case of the recursion: #!/bin/csh if (`other_name $1 $2 $3`) then echo 1 exit endif Reverse Engineering (Source Code Analysis)
Last Bit o’ Code • The rest of the code deals with recursing up the inheritance tree… if (`parent $1 $2 | wc -l ` != 0) then ancestor $1 `parent $1 $2` $3 else echo 0 endif Reverse Engineering (Source Code Analysis)