A staged static program analysis to improve the performance of runtime monitoring

Eric Bodden, McGill University A stagedstatic program analysis toimprove the performanceof runtime monitoring

Tracematches - Where can we optimize? } McGill/Waterloo } Oxford Leak elimination /Indexing

This talk: remove overhead through static program analysis

Staged analysis remove as many shadows (instrumentation points)as early as possible, but always in a sound way

Quick check - idea library-tracematches • company rules • domain specific rules • generic API rules • … FailSafeIter ASyncIter ASyncIter client programs FailSafeIter ASyncIter Program 1 not all the tracematches apply to all the programs

Quick check - example An example... java.util.Collections Collection c = Collections.synchronizedCollection(myC); synchronized(c) { } Iteratori = c.iterator(); while(i.hasNext()) foo(i.next());

Tracematch "ASyncIteration" tracematch(Object c) { sym sync after returning(c): call(* Collections.synchr*(..)); sym asyncIter before: call(* Collection+.iterator()) && target(c) && if(!Thread.holdsLock(c)); sync asyncIter { System.err.println( "Iterations over "+c+" must be synchronized!" ); } }

Quick check - example skip(aSyncIter) sync aSyncIter

Interlude - Shadows asyncIter(c=c3) asyncIter(c=c2) o2 o1 sync(c=c1)

Preparation phase ("cg" phase) • right before the flow-insensitive stage • builds context-sensitive points-to sets • demand-driven refinement-based analysis • as a side effect builds a call graph (which can then be used in flow-sensitive stage) • very expensive • can run several minutes • no way to get around it

Flow-insensitive analysis • uses points-to information • idea: even if some groups of objects could lead to a complete match, many others cannot • example: even if a program uses synchronized collections, most collections will never be synchronized

Flow-insensitive analysis - Step 1 • First have to determine what is required for a complete match. • Edges generated by a Kleene-* sub-expression are “not required”. • Hence, use a special Thompson Construction...

Example – observer pattern notify create notify skip(create) create notify+

Flow-insensitive analysis - Step 2 Path Infos: • a state machine can accept a sequence of events possibly over multiple paths • for each such path, record what needs to be visited to reach a final state • store result in a path info

Example – observer pattern create notify skip(create) ( labels = [create,notify] , skip={create} )

Flow-insensitive analysis - Step 3 • each path info represents a single complete path • conceptually have to check for each set of shadows if this set contains all the labels of at least one such path info • Problem: may well have 1000+ shadows, which leads to 21000+ subsets!

( labels={a,b,c}, skip={f} ) {a1} {b1,b2} {c1} x x {a1, b1, c1} {f1,f2} o2 o3 o1 for this path-info keep a1, b1, c1, f1 “complete & consistent shadow group” {a1, b2, c1 }

Flow-insensitive analysis • This way, it usually runs within seconds. • in addition to the point-to analysis in the "cg" phase as mentioned previously • Precision depends heavily on the precision of the underlying points-to analysis. • Need context information for factory methods.

Flow-sensitive analysis Step 1 - Model construction With all remaining shadows... • Build a state machine for each method on a path from the entry point to a shadow. • Combine state machines interprocedurally to one large FSM. • Build thread contexts. (explained later)

Flow-sensitive analysis - example sync(c=c3) Step 1 - Model construction aSyncIter(c=c4) aSyncIter(c=c1) sync(c=c4)

Flow-sensitive analysis Step 2 - Fixed-point iteration

sync(c=c3) aSyncIter(c=c4) aSyncIter(c=c1) sync(c=c4)

Flow-sensitive analysis Problems • need to model multithreading soundly • also, it can take a long time to reach the fixed point in general • many possible combinations of shadows (arrows) and bindings (diamonds) • if points-to sets overlap a lot, disjuncts “travel long distances”, history components grow

Benchmarks Tested a number of different tracematches

Benchmarks ... on the entire DaCapo suite

Runtime overheads runtime overheads > 10% speedups due to different scheduling order

After Quick check • removed entire overhead for 7 out of 18 benchmarks

After Flow-insensitive analysis • significant improvements in 6 cases • all shadows removed for lucene/LeakingSync • no improvements in remaining 5 cases

After Flow-sensitive analysis • no improvements whatsoever • ran out of memory for bloat/FailSafeIter and bloat/HasNext

Reasons for bad performance of flow-sensitive analysis • missing must-alias information for HasNext(Elem) • problems with overlapping points-to sets • have to cut off fixed point iteration too often • very conservative handling of multi-threading Suggestions are very welcome!

Run time before optimizations (bloat/FailSafeIter and bloat/HasNext omitted)

Runtime after optimizations (bloat/FailSafeIter and bloat/HasNext omitted)

Conclusions • static program analysis can often improve the runtime performance of finite state monitors a lot • even flow-insensitive analysis works surprisingly well • precise points-to information is crucial • flow-sensitive analysis seems like a huge challenge, probably not worthwhile

Future work • handle negative updates through must-alias information • Nomair Naeem, now at Waterloo • incorporate must control-flow information • more finegrained handling of threads • "may happen in parallel" analysis, Lin Li • asynchronous products / handshaking protocols • Can we use same tricks as at runtime? • "garbage-collect" disjuncts • indexing

? Discussion points • How can the analysis be improved? • Can it be reused for other formalisms? • Is flow-sensitivity really worth the effort? • How else can the analysis results be useful? Technical Report abc-2006-4 http://www.aspectbench.org/

A staged static program analysis to improve the performance of runtime monitoring