150 likes | 282 Views
R Unbox Variable Lookup Optimization. Apr. 2012. Outline. Unbox Variable Cache Structure Unbox Variable Lookup Instructions and Compiler Transformation Performance Evaluation Obvious Performance Gain in ForLoopAdd example. Unbox Variable Cache Structure.
E N D
R Unbox Variable Lookup Optimization Apr. 2012
Outline • Unbox Variable Cache Structure • Unbox Variable Lookup Instructions and Compiler Transformation • Performance Evaluation • Obvious Performance Gain in ForLoopAdd example
Unbox Variable Cache Structure • A Software Cache for Unboxed Scalar Variables • Data: • 64bit: store int/real • Length: current frame constant table’s size • Use symbol’s index to access the cache directly • May waste some space, but simple • State: • Used 4bit right now • 0~1: state: INVALID, VALID, MODIFIED • 2~3: cache type: Logical, Int, Real • Counter for each frame • modified_count: count of the total modified cache cells 64bit Cache State Cache Current Frame Cache Cache State
Cache State • Simple state change policies First time get var and unbox Get var INVALID VALID Set unbox var Write back Set unbox var MODIFIED Get var/set unbox var
Unbox Variable Lookup Instructions • Get Variable • GETLOGICALUNBOX, GETINTUNBOX, GETREALUNBOX • Merge the semantics of • GETVAR, GUARD and UNBOX • Three Operands • 1): symbol index; 2) Expected type; 3) Guard failure PC • Set Variable • SETUNBOXVAR • Write a scalar value into the cache • If the var is a new variable, still not define it in current frame • Define it when writing back • POP value from the stack • POPUNBOX • Slightly different to POP, need pop unbox type stack, too • Write back modified scalar variables • UNBOXWRITEBACK • Box all modified variables, and set them in the current frame
Unbox Variable Lookup Compiler Transformation • Current Compiling Passes • Decoding Pass • Build Jump Target • Type Annotation Pass • Unbox Opportunity Identification Pass • Code Unbox Opt Transformation Pass • Will use the new Instructions • Code Clean Pass • PC Fix Pass • Jump Target Fix • Encoding Pass Add new policies
Unbox Opportunity Identification Pass • SETVAR • Original: the top stack element should be a boxed value • New: the top stack element could be unboxed • Could expose more unbox oportunites PC STMT 1 LDCONST, 1 3 SETVAR, 2 5 POP Can unbox this one
Code Unbox Opt Transformation Pass • SETVAR • If the top stack element is a scalar value • SETVAR UNBOX; SETUNBOXVAR; BOX • Insert GUARD if needed (the top stack element’s type is from profile) • The final BOX: Always maintain the stack’s shape in this pass • CALL, RETURN • Add UNBOXWRITEBACK in front of it PC STMT 1 LDCONST, 1 3 UNBOXREAL 4 SETVARUNBOXVAR, 2 6 BOXREAL 7 POP PC STMT 1 LDCONST, 1 //Real 3 SETVAR, 2 5 POP
Code Clean Pass (Instruction Combine) • GETVAR + GUARD + UNBOX • According to the type, transform to • GETLOGICALUNBOX, GETINTUNBOX, GETREALUNBOX • BOX + POP • POPUNBOX • No need box again. But need pop the unbox type stack
Transformation Example • RealAdd run <-function() { a <- 101; b <- a+202; print(b); }; PC STMT 1 LDCONSTREAL, 1 3 SETUNBOXVAR, 2 5 POP_UNBOX 6 GETREALUNBOX, 2, 2, 8 10 LDCONSTREAL, 3 12 REALADD 13 SETUNBOXVAR, 5 15 POP_UNBOX 16 GETFUN, 6 18 MAKEPROM, 7 20 UNBOXWRITEBACK 21 CALL, 8 23 UNBOXWRITEBACK 24 RETURN PC STMT 1 LDCONST, 1 3 SETVAR, 2 5 POP 6 GETVAR, 2 8 LDCONST, 3 10 ADD, 4 12 SETVAR, 5 14 POP 15 GETFUN, 6 17 MAKEPROM, 7 19 CALL, 8 21 RETURN
Special Handling in ForLoop • The ForLoop will update the loop variable in STEPFOR • The semantic is something like a GETVAR • If the loop variable is Logical/Integer/Real • Just get the scalar value, and load it into the cache • Save the value • Change the cache state to VALID
Performance Evaluation on ForLoop Examples • Examples • Experiment Methodology • Running Method • Run “run()” 10 times • 1st time: profile • 2nd time: trigger compiling/optimization, and start to use the new code • 3rd-10th: just use the optimized code • All the following result are normalized • Average of 5 runs in each case • Normalized by only using the 3rd-10th runs • And normalized to 1 iteration ForIntAdd ForIntAdd3 run <-function() { r <- 11; for( i in 1:1000000) { r <- r + 1 + i + 2; } print(r); }; run <-function() { r <- 11; for( i in 1:1000000) { r <- r + i; } print(r); };
Performance Result • ForIntAdd • ForIntAdd3
Analysis - Performance Gain Source • Much efficient variable lookup • Much Less object creation during SETVAR • ForLoop example: the same effect that hosting all box/unbox out of the loop
Next Step • Working on more larger test case • Code pieces extracted from real benchmark • E.g. Shootout • Add New Instruction/Compiler transformation • To support the codes used in these new test cases