470 likes | 568 Views
Approaches to Reflective Method Invocation. Dr. Ian Rogers, Dr. Jisheng Zhao, and Prof. Ian Watson The University of Manchester. Third International Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS 2008) July 7, Paphos, Cyprus.
E N D
Approaches to Reflective Method Invocation Dr. Ian Rogers, Dr. Jisheng Zhao, and Prof. Ian Watson The University of Manchester Third International Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS 2008) July 7, Paphos, Cyprus
Reflective Method Invocation • Motivation • Allow dynamic extension of applications by the use of methods/constructors not known at compile time • Uses include Java Beans, JNI code • Overheads • Creating representation • Invocation • Parameter boxing
Implementation with out-of-line code • Out-of-line code is code that performs the bridge from regular Java bytecode to the dynamic method • Written in native code, C, assembler, etc. • Used in Jikes RVM – performance would indicate also used in IBM DK and BEA JRockit
Optimizing out-of-line code • Objects representing methods are immutable • Constant methods that are invoked can have the invocation and parameter boxing overheads eliminated • Constant methods are created by calls to pure routines or by chasing initialized final references
Bytecode generation • Bytecode to implement a reflective method call can be dynamically generated at runtime by creating a special class that performs the method invocation • Pros • Bytecode is interpreted so can boost performance of even interpreted code • Not reliant on finding method as a constant value • Cons • Cost of producing and storing bytecode • Used in Sun’s HotSpot VM
Eager and lazy bytecode generation • Eager • Generate class on construction of method object • Field holding generated object can be final • Lazy • Generate class on first method invocation • Use hashtable to hold object potentially avoiding storage overhead • Use of pure methods can eliminate hashtable lookup in opt compiled code
Synthetic performance simplification
Conclusions • Maximum performance achievable by simplification or bytecode generation • Bytecode generation cheap enough to beat simplification • Eager bytecode generation gives best DaCapo execution time improvement • Lazy bytecode generation gives best DaCapo mean speed up
Pure Method Analysis withinJikes RVM Dr. Jisheng Zhao, Dr. Ian Rogers, Dr. Chris Kirkham and Prof. Ian Watson The University of Manchester Third International Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS 2008) July 7, Paphos, Cyprus
What’s a pure method • Greek next day of the week • Input day of week • Output next day of week
Literal argument gives another literal as a result • DEUTERA -> TRITH • TRITH -> TETARTH • TETARTH -> PEMPTH • PEMPTH -> PARASKEUH • PARASKEUH -> SABBATO • SABBATO -> KURIAKH • KURIAKH -> DEUTERA
How can we optimize? • getNextDay(TRITH)
How can we optimize? • getNextDay(TRITH) The answer can only be TETARTH
How can we optimize? • X = getToday(); • Y = getNextDay(X); • … • Z = getNextDay(X);
How can we optimize? • X = getToday(); • Y = getNextDay(X); • … • Z = getNextDay(X); Must generate the same result, so copy 1st result
Other optimizations • Escape analysis • Can eliminate synchronization if method object is passed to pure method • Dead code elimination • Unused results of pure methods that don’t throw exceptions can have instructions eliminated • Memoization
Knowing something is pure • Implementation of getNextDay may use a map or other potentially mutable data storage • Stationary field analysis, amongst others, shows this is unlikely [Unkel and Lam ’08, Rogers, Zhao and Watson ‘08]
Means of determining purity • Programmer provider annotations • Simple bytecode analysis • e.g. a method having a method call, load or store wouldn’t be pure • Optimizing compiler analysis • examine bytecode after optimization to determine purity
Conclusions • Pure methods provide optimization opportunities • 1469 methods are determined to be pure in Jikes RVM boot image through simple analysis • Optimizing compiler analysis provides further runtime improvement • Runtime optimizing compiler analysis limited as few methods are compiled by optimizing compiler, simple analysis still possible
Related work • A. Salcianu and M. Rinard • Static offline analysis handling pointers to objects created within method • Haiying Xu, Christopher J. F. Pickett, and Clark Verbrugge • Multiple levels of pure-ness found through SOOT framework • Basis for memoization optimization in an interpreter that fails to regain the overhead
Boot Image Layout for Jikes RVM Dr. Ian Rogers, Dr. Jisheng Zhao, and Prof. Ian Watson The University of Manchester Third International Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems (ICOOOLPS 2008) July 7, Paphos, Cyprus
What is boot image layout? • Boot image captures the state of a VM when it starts • This state includes • code to run when the VM starts • objects required for that execution • As no threads are active the only live objects are literals or static fields
Depth-first traversal Boot image static Foo foo Foo object String value1 String value2 String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Depth-first traversal Boot image static Foo foo char[] Foo object String value1 String value2 String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Depth-first traversal Boot image static Foo foo char[] Foo object String value1 String value2 String String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Depth-first traversal Boot image static Foo foo char[] Foo object String value1 String value2 String char[] String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Depth-first traversal Boot image static Foo foo char[] Foo object String value1 String value2 String char[] String String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Depth-first traversal Boot image static Foo foo char[] Foo object String value1 String value2 String char[] String Foo String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Problems • References within objects need to be scanned by a stop-the-world GC • References are distributed throughout boot image • Can optimize this by observing that only mutable references need to be scanned
Visualizing depth-first traversal’s references Red = Reference White = Non-reference 2666 pages contain references
Breadth-first traversal Boot image static Foo foo Foo Foo object String value1 String value2 Queue value1 value2 String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Breadth-first traversal Boot image static Foo foo Foo Foo object String value1 String value2 Queue Value2 value Queue value2 value String String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Breafth-first traversal Boot image static Foo foo Foo Foo object String value1 String value2 Queue value value Queue value value String String String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Breadth-first traversal Boot image static Foo foo Foo Foo object String value1 String value2 Queue value String String char[] String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Breadth-first traversal Boot image static Foo foo Foo Foo object String value1 String value2 Queue String String char[] char[] String object char[] value int count int offset String object char[] value int count int offset char[] int length char…. char[] int length char….
Visualizing breadth-first traversal’s references Red = Reference White = Non-reference 2017 pages contain references
Prioritized traversal • Breadth-first traversal queues references • The next element to be removed from a queue can be prioritized • Make the breadth first queue a prioritized queue and then implement comparators
Criteria for prioritization • Name of object’s class • Object’s type reference ID • Size of object • Number of references within object • Number of mutable references within object • Density – references ÷ object size • These criteria can be chained • the next comparator is used when the result of the comparison is identical • Future work: profiling, in particular allocation frequency
Different prioritization approaches and number of pages used
Results from prioritization • Name is a better prioritization criteria than type reference ID • Places arrays of primitive types together • Density ignoring final fields is better than using final fields • Best scheme can avoid references on ~5MB worth of pages (5,316,608bytes) when compared to depth-first traversal
Visualizing different approaches Depth-first Breadth-first Prioritized
Conclusions • Boot image layout using a prioritized traversal can dramatically reduce the pages that are traversed during a stop-the-world GC • Effect on performance negligible • Careful selection of benchmark may demonstrate improvement
Related work • Layout of objects is considered for object inlining as well as for copying GCs • Xianglong Huang, Stephen M. Blackburn, Kathryn S. McKinley, J. Eliot B. Moss, Zhenlin Wang, and Perry Cheng. The garbage collection advantage: Improving program locality. • Wen ke Chen, Sanjay Bhansali, Trishul M. Chilimbi, Xiaofeng Gao, and Weihaw Chuang. Profile-guided proactive garbage collection for locality optimization. • Michael S. Lam, Paul R. Wilson, and Thomas G. Moher. Object type directed garbage collection to improve locality.