1.03k likes | 1.51k Views
Chord: A Program Analysis Platform for Java. CS 6340. What is Chord?. Static and dynamic program analysis framework for Java Started in 2006 as static Ch ecker o f r aces and d eadlocks Publicly available under New BSD License Key goals:
E N D
What is Chord? • Static and dynamic program analysis framework for Java • Started in 2006 as static Checker of races and deadlocks • Publicly available under New BSD License • Key goals: • versatile: applies to various analyses, domains, platforms • extensible: users can build own analyses atop given ones • productive: facilitates rapid prototyping of analyses • robust: deterministic, handles partial programs, etc.
Key Features of Chord • Many standard static and dynamic analyses • Writing/solving analyses using Datalog/BDDs • Analyses as “building blocks” • Context-sensitive static analysis framework • Dynamic analysis framework
Outline of Lecture • Getting Started with Chord • Program Representation • Analysis Using Datalog/BDDs • Chaining Analyses Together • Context-Sensitive Analysis
Downloading Chord • Stable Binary Release • http://jchord.googlecode.com/files/chord-bin-2.0.tar.gz • Stable Source Release • http://jchord.googlecode.com/files/chord-src-2.0.tar.gz (mandatory) • Chord’s source code + JARs of libraries used by Chord • http://jchord.googlecode.com/files/chord-libsrc-2.0.tar.gz (optional) • (adapted) Java source code of libraries used by Chord • Latest Development Snapshotsvn checkout http://jchord.googlecode.com/svn/trunk/ chord • Or checkout only relevant directories under trunk/: • main/ (released as 1 above) • libsrc/ (released as 2 above) • test/ (Chord’s regression test suite) • … (many more)
Compiling Chord • Requirements: • JVM for Java 5 or higher • Apache Ant • C++ compiler(not needed by default) • Optional: edit chord.properties • to enable C BuDDy library:set chord.use.buddy=true • to enable C++ JVMTI agent:set chord.use.jvmti=true • Run in main directory:ant compile main/ build.xml chord.properties agent/ bdd/ doc/ examples/ lib/ src/ web/ chord.jar libbuddy.so | buddy.dll |libbuddy.dylib libchord_instr_agent.so
Running Chord • Requirements: JVM for Java 5 or higher • no other dependencies (e.g., Eclipse) • Run either command in any directory: • ant –f <...>/build.xml [–Dkeyi=vali]* run • requires Apache Ant • not available in Binary Release • java –cp <…>/chord.jar [–Dkeyi=vali]* chord.project.Boot • where <…> denotes path of Chord’s main/ directory • –Dkeyi=vali sets value of system property keyi to vali
Chord Properties • All inputs to Chord are specified via System Properties • conventionally named chord.* (e.g., chord.work.dir) • Three choices with decreasing precedence: • On command line via –Dkey=val format • use to specify properties specific to the current Chord run • Via user-specified file denoted by chord.props.file • use to specify properties specific to program being analyzed(e.g. its main class, classpath, etc.) • default value = "[chord.work.dir]/chord.properties" • Via pre-defined file main/chord.properties • use to specify properties that must hold in every Chord run(e.g., maximum memory to be used by JVM)
Architecture of Chord example program analysis programquadcode starts, blocks on D1 resumes, runs to finish starts, runs to finish domain D1analysis relation R12analysis domain D2analysis bytecodetranslator (joeq) starts, runs to finish relation R12 domain D2 domain D1 relationR2 relationR1 staticanalysis Dataloganalysis dynamicanalysis programbytecode bytecodeinstrumentor(javassist) bddbddb BuDDy programinputs Java program Classic or Modern Runtime starts, blocks on D1, D2, R1, R12 starts, blocks on D1 user demands this to run resumes, runs to finish resumes,runs to finish resumes, runs to finish starts, blocks on R2, D2 analysis resultin XML programsource analysis resultin HTML saxon XSLT Java2HTML
Setting Up a Java Program for Analysis example/src/ foo/ Main.java ... classes/ foo/Main.class ... lib/src/taz/ ... jar/ taz.jar chord.properties chord_output/ bddbddb/ • Command to run in Chord’s main directory: • ant –Dchord.work.dir=<…>/example run chord.main.class=foo.Mainchord.class.path=classes:lib/jar/taz.jarchord.src.path=src:lib/srcchord.run.ids=0,1chord.args.0="-thread 1 -n 10"chord.args.1="-thread 2 -n 50"
Outline of Lecture • Getting Started with Chord • Program Representation • Analysis Using Datalog/BDDs • Chaining Analyses Together • Context-Sensitive Analysis
Java Program Representations Java source code.java javac Java bytecode.class javap DisassembledJava bytecode
Example: Java Source Code • 1: package test;2:3: public class HelloWorld {4: public static void main(String[] args) {5: System.out.print("Hello World!");6: }7: } File test/HelloWorld.java:
Pretty-Printing Java Bytecode javap –private –verbose –classpath<CLASS_PATH> [–bootclasspath<BOOT_CLASS_PATH>] <CLASS_NAME> • public class test.HelloWorld extends java.lang.ObjectConstant pool:const #1 = Method #6.#20; // java/lang/Object."<init>":()V ...public static void main(java.lang.String[]);Code: Stack=2, Locals=1, Args_size=1 0: getstatic #2; // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3; // String Hello World! 5: invokevirtual #4; // Method java/io/PrintStream.println:... 8: return SourceFile: "HelloWorld.java" Run "javac –g" on .java files to keep debuginfo (lines, vars, source) in .class files LineNumberTable: line 5: 0 line 6: 8LocalVariableTable: Start Length Slot Name Signature 0 9 0 args [Ljava/lang/String;
Java Program Representations Java source code.java javac Joeq Java bytecode.class Quadcode javap DisassembledJava bytecode
Pretty-Printing Quadcode ant –Dchord.work.dir=<WORK_DIR>–Dchord.out.file=<OUTPUT_FILE> –Dchord.print.classes=<CLASS_NAMES>–Dchord.verbose=0 run • Class: test.HelloWorldMethod: main:([Ljava/lang/String;)V@test.HelloWorld 0#1 5#3 5#2 8#4Control flow graph:BB0 (ENTRY) (in: <none>, out: BB2)BB2 (in: BB0 (ENTRY), out: BB1 (EXIT))1: GETSTATIC_A T1, .out3: MOVE_A T2, AConst: "Hello World!" 2: INVOKEVIRTUAL_Vprintln:(Ljava/lang/String;)V@java.io.PrintStream, (T1,T2)4: RETURN_VBB1 (EXIT) (in: BB2, out: <none>)Exception handlers: []Register factory: Registers: 3 Alternative options: –Dchord.print.methods=<METHOD_SIGNS> –Dchord.print.all.classes=true Replace any `$` by `#` toprevent shell interpretation
Type Hierarchy • jq_Type jq_Primitive jq_Reference jq_Class jq_Array (all defined in package joeq.Class)
chord.program.Program API • static Program g() • fully-qualified name of the class, e.g., "java.lang.String[]" • IndexSet<jq_Type> getTypes() • all types in classes that may be loaded • IndexSet<jq_Reference> getClasses() • all classes that may be loaded • IndexSet<jq_Method> getMethods() • all methods that may be called
joeq.Class.jq_Class API • String getName() • fully-qualified name of the class, e.g., "java.lang.String[]" • jq_InstanceField[] getDeclaredInstanceFields() • all instance fields declared in the class • jq_StaticField[] getDeclaredStaticFields() • all static fields declared in the class • jq_InstanceMethod[] getDeclaredInstanceMethods() • all instance methods declared in the class • jq_StaticMethod[] getDeclaredStaticMethods() • all static methods declared in the class
joeq.Class.jq_Method API • String getName().toString() • name of the method • String getDesc().toString() • descriptor of the method, e.g., "(Ljava/lang/String;)V" • jq_ClassgetDeclaringClass() • declaring class of the method • ControlFlowGraphgetCFG() • control-flow graph of the method • Quad getQuad(intbci) • first quad at the given bytecode offset (null if missing) • intgetLineNumber(intbci) • line number of the given bytecode offset (-1 if missing) • String toString() • ID of the method in format mName:mDesc@cName
Control Flow Graphs (CFGs) • Each CFG contains: • a set of registers (register factory) • a directed graph whose nodes are basic blocks and edges denote control flow • Register Factory: • one register per argument (local variables) • named R0, R1, …, Rn • one register per temporary (stack variables) • named Tn+1, Tn+2, …, Tm • Basic Block (BB): • sequence of primitive statements (quads) • unique entry BB: no quads and no incoming edges • unique exit BB: no quads and no outgoing edges
joeq.Compiler.Quad.ControlFlowGraph API • RegisterFactorygetRegisterFactory() • set of all local variables • EntryOrExitBasicBlock entry() • unique entry basic block • EntryOrExitBasicBlock exit() • unique exit basic block • List<BasicBlock>reversePostOrder () • List of all basic blocks in reverse post-order • jq_MethodgetMethod() • containing method of the CFG
joeq.Compiler.Quad.BasicBlock API • int size() • number of quads in the basic block • Quad getQuad(int index) • quad at the given 0-based index • List<BasicBlock> getPredecessors() • list of immediate predecessor basic blocks • List<BasicBlock> getSuccessors() • list of immediately successor basic blocks • jq_MethodgetMethod() • containing method of the basic block
Quad Instructions • Each quad contains an operator and upto 4 operands • Example: getfield l = b.f: Operand lo = Getfield.getDest(q);Operand bo = Getfield.getBase(q);if (lo instanceofRegisterOperand &&boinstanceofRegisterOperand) { Register l = ((RegisterOperand) lo).getRegister(); Register b = ((RegisterOperand) bo).getRegister();jq_Field f = Getfield.getField(q).getField(); ...}
Kinds of Quads • joeq.Compiler.Quad.Operator • Move Getstatic Branch Invoke • Phi PutstaticIntIfCmpInvokeVirtual • Unary GetfieldGotoInvokeStatic • Binary PutfieldJsrInvokeInterface • New ALoad Ret • NewArrayAStoreLookupSwitch • MultiNewArrayCheckcastTableSwitch • AlengthInstanceof • Monitor Return
joeq.Compiler.Quad.Quad API • Operator getOperator() • kind of the quad • intgetBCI() • bytecode offset of the quad in its containing method • String toByteLocStr() • unique identifier of the quad in format offset!mName:mDesc@cName • String toJavaLocStr() • location of the quad in format fileName:lineNum in Java source code • String toLocStr() • location of the quad in both Java bytecode and source code • String toVerboseStr() • verbose description of the quad (its location plus contents) • BasicBlockgetBasicBlock() • containing basic block of the quad
Traversing Quadcode • import chord.program.Program;import joeq.Class.jq_Method;import joeq.Compiler.Quad.*; QuadVisitor qv = new QuadVisitor.EmptyVisitor() { public void visitNew(Quad q) { ... } public void visitPhi(Quad q) { ... } ...}; • Program program = Program.g();for (jq_Method m : program.getMethods()) { if (!m.isAbstract()) {ControlFlowGraphcfg = m.getCFG(); for (BasicBlock bb : cfg.reversePostOrder()) for (Quad q : bb.getQuads())q.accept(qv); }}
Java Program Representations Java source code.java HTMLizedJava source code.html j2h Java2HTML javac Joeq Java bytecode.class Quadcode javap DisassembledJava bytecode
HTMLizing Java Source Code • Programmatically: • import chord.program.Program;Program program = Program.g();program.HTMLizeJavaSrcFiles(); • From command line: • Use j2h:ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_xref • Use Java2HTML:ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_fast
Java Program Representations Java source code.java HTMLizedJava source code.html j2h Java2HTML javac Joeq Java bytecode.class Quadcode javap Chord Jasmin DisassembledJava bytecode Jasmin code.j
Analysis Scope Construction • Determines which parts of the program to analyze • Computed in either of these cases: • chord.build.scope=true • chord.program.Program.g() is called • Algorithm specified by chord.scope.kind=[rta|cha|dynamic] • Rapid Type Analysis (RTA) • Class Hierarchy Analysis (CHA) • Dynamic Analysis • All three algorithms require specifying: • chord.main.class=<MAIN CLASS> • chord.class.path=<CLASSPATH>
Analysis Scope Representation • Reachable Methods • stored in file specified by chord.methods.file(default = "[chord.out.dir]/methods.txt") • Resolved Reflection • stored in file specified by chord.reflect.file(default = "[chord.out.dir]/reflect.txt") mname:mdesc@cname... # resolvedClsForNameSites ... # resolvedObjNewInstSites ... # resolvedConNewInstSites ... # resolvedAryNewInstSites ... Class Class.forName(String) Object Class.newInstance() Object Constructor.newInstance(Object[]) Object Array.newInstance(Class, int) bci!mname:mdesc@cname->cname1,cname2,...,cnameN
Rapid Type Analysis (RTA) • Preferred (and default) scope construction algorithm • Allows specifying reflection resolution via chord.reflect.kind=[none|static|dynamic] • Preferred way to resolve reflection is ‘dynamic’ and requires specifying how to run program: • chord.run.args=id1,…,idN • chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>
Dynamic Analysis Based Scope Construction • Runs program and observes which classes are loaded • Requires JVMTI (set chord.use.jvmti=true in file main/chord.properties) • Requires specifying how to run program: • chord.run.args=id1,…,idN • chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN> • All methods of each loaded class are deemed reachable • Currently no support for reflection resolution
Additional Analysis Scope Features • Scope Reuse • Enables using scope constructed by a previous run of Chord • Constructs scope from files specified by chord.methods.fileand chord.reflect.file • Specified via chord.reuse.scope=true • Scope Exclusion • Enables excluding certain classes from scope • Treats all methods in such classes as no-ops • Specified via three properties: • 1. chord.std.scope.exclude (default = "") • 2. chord.ext.scope.exclude (default = "") • 3. chord.scope.exclude (default = "[chord.std.scope.exclude],[chord.ext.scope.exclude]")
Native Method Stubs • Specified in file main/src/chord/program/stubs/stubs.txtin format: • mname:mdesc@cnamestub_cname • where stub_cname denotes a class implementing: • public interface joeq.Compiler.Quad.ICFGBuilder { public ControlFlowGraph run(jq_Method m);} • Example: start:()V@java.lang.Threadchord.program.stubs.ThreadStartCFGBuilder
Example Native Method Stub • void start() {this.run(); return; } public ControlFlowGraphrun(jq_Method m) {jq_Classc = m.getDeclaringClass();jq_Method n = c.getDeclaredInstanceMethod( new jq_NameAndDesc("run", "()V"));RegisterFactory f = new RegisterFactory(0, 1); Register r = f.getOrCreateLocal(0, c);ControlFlowGraphcfg = new ControlFlowGraph(m, 1, 0, f); Quad q1 = Invoke.create(0, m, Invoke.INVOKEVIRTUAL_V.INSTANCE, null, new MethodOperand(n), 1);Invoke.setParam(q1, 0, new RegisterOperand(r, c)); Quad q2 = Return.create(1, m, RETURN_V.INSTANCE);BasicBlockbb = cfg.createBasicBlock(1, 1, 2, null);bb.appendQuad(q1); bb.appendQuad(q2);BasicBlockeb = cfg.entry(), xb = cfg.exit();eb.addSuccessor(bb); bb.addPredecessor(eb);bb.addSuccessor(xb); xb.addPredecessor(bb); return cfg;}
Outline of Lecture • Getting Started with Chord • Program Representation • Analysis Using Datalog/BDDs • Chaining Analyses Together • Context-Sensitive Analysis
Program Domain • Building block for analyses based on Datalog/BDDs • Represents an indexed set of values of a fixed kind • typically artifacts from program being analyzed (e.g., set of all methods in the program) • Assigns unique 0-based index to each value • everything in Datalog/BDDs must be numbered • indices given in order in which values are added • order affects efficiency of running analysis on large sets • initial indices (0, 1, ...) typically given to frequently-usedvalues (e.g., the main method) • O(1) access to value given index, and vice versa
Writing a Program Domain Analysis package chord.analyses.method;@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> {@Override public void fill() {Program program = Program.g();add(program.getMainMethod());jq_Method start = program.getThreadStartMethod();if (start != null) add(start);for (jq_Method m : program.getMethods()) add(m); }} • Domain M: all methods in the program • main method has index 0 • java.lang.Thread.start() method has index 1
Running a Program Domain Analysis package chord.analyses.method;@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> {@Override public void fill() {Program program = Program.g();add(program.getMainMethod());jq_Method start = program.getThreadStartMethod();if (start != null) add(start);for (jq_Method m : program.getMethods()) add(m); }} ant –Dchord.work.dir=<…> –Dchord.run.analyses=M run
Running a Program Domain Analysis package chord.analyses.method;@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> {@Override public void fill() {Program program = Program.g();add(program.getMainMethod());jq_Method start = program.getThreadStartMethod();if (start != null) add(start);for (jq_Method m : program.getMethods()) add(m); }} chord_output/ bddbddb/ M.map M.dom main:([Ljava/lang/String;)V@Bldgstart:()V@java.lang.Thread<init>:()V@Bldg… <N> M <N>M.map
chord.project.analyses.ProgramDom<T> API • void setName(String name) • set name of domain • boolean add(T val) • add value to domain if not present; return true if added • intgetOrAdd(T val) • add value to domain if not present; return its index in either case • void save() • save domain to disk (.dom and .map files) • String toUniqueString(T val) • unique string representation of value • int size() • number of values in domain • T get(int index) • value having the given index; IndexOutofBoundsEx if not found • intindexOf(T val) • index of given value; -1 if not found Note: values once addedcannot be removed!
Program Relation • Building block for analyses based on Datalog/BDDs • Represents a set of tuples over one or more fixed program domains • Represented symbolically as a BDD • enables storing and manipulating large relations efficiently • Provides various relational operations • projection, selection, join, etc. • BDD size and efficiency of operations depends heavily on encoding of relation content as opposed to size • ordering of values within program domains • relative ordering between program domains
Writing a Program Relation Analysis • package chord.analyses.invk;@Chord(name = "MI", sign = "M0,I0:M0_I0")public class RelMIextends ProgramRel { @Override public void fill() {DomIdomI = (DomI) doms[1]; for (Quad q : domI) {jq_Methodm = q.getMethod();add(m, q); } }} Relation MI: tuples (m, i) such that method m contains call i • M0,I0: Domain names • Order mnemonically (hard to change over time) • Suffix 0, 1, etc. distinguishes repeating domains • M0_I0: Domain order • Only dictates performance • Can also be I0_M0 or I0xM0 • Easy to change over time
Writing a Program Relation Analysis package chord.analyses.var;@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVTextends ProgramRel{ @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister();jq_Type t = o.getType(); add(v, t); } }} Relation VT: tuples (v, t) such that local variable v has type t
Running a Program Relation Analysis package chord.analyses.var;@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVTextends ProgramRel{ @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister();jq_Type t = o.getType(); add(v, t); } }} ant –Dchord.work.dir=<…> –Dchord.run.analyses=VT run
Running a Program Relation Analysis package chord.analyses.var;@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVTextends ProgramRel{ @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister();jq_Type t = o.getType(); add(v, t); } }} # V0:2 T0:2# 1 2# 3 46 42 1 4 37 4 0 16 3 7 15 3 0 74 2 5 03 2 6 52 1 3 4 chord_output/ bddbddb/ V.dom, T.dom,V.map, T.map VT.bdd
Program Relation as Binary Function Variable v0 has types t1, t2, t3 Variable v1 has type t3 Variable v2 has type t3 Relation VT = { (0, 1), (0, 2), (0, 3), (1, 3), (2, 3) }