811 likes | 982 Views
Andy Wharmby. Understanding PHP Opcodes. PHP Opcodes. Presentation splits into 3 sections Generation of opcodes ZEND_COMPILE Generation of the Interpreter code Interpreter comes in many flavours!! Execution of opcodes ZEND_EXECUTE. Execution path for a script.
E N D
Andy Wharmby Understanding PHP Opcodes
PHP Opcodes Presentation splits into 3 sections Generation of opcodes ZEND_COMPILE Generation of the Interpreter code Interpreter comes in many flavours!! Execution of opcodes ZEND_EXECUTE
Execution path for a script php_execute_script() zend_execute_scripts() zend_compile_file() zend_execute() user call (function/method) include/require
zend_compile_file Function ptr that can be overridden to call alternative compiler, e.g. by B-compiler By default resolves to a call to compile_file() in zend_language_scanner.c Compilation is broken down into 2 steps: Lexical analysis of source PHP script into tokens Parsing of resulting tokens into opcodes Lexical Analyser Parser PHP script byte codes tokens
Lexical Analysis Lexical analyser code in zend_language_scanner.c generated from zend_language_scanner.l using “flex” Exposed to userspace as token_get_all() <?php $tokens = token_get_all("<?php echo = ‘Hello World’; ?>"); foreach($tokens as $token) { if (is_array($token)) { printf("%s \t %s\n", token_name($token[0]), $token[1]); } else { printf("\t'%s'\n", $token); } } ?> T_OPEN_TAG <?php T_ECHO echo T_WHITESPACE T_CONSTANT_ENCAPSED_STRING 'Hello World' ';' T_CLOSE_TAG ?> Lexical Analysis
Parsing Next the tokens are compiled into opcodes Parser code in zend_language_parser.c which is generated from zend_langauge_parser.l by “Bison” Calls code in zend_compile.c to generate opcodes ZEND_OP ZEND_OP ZEND_OP ZEND_OP T_OPEN_TAG <?php T_ECHO echo T_WHITESPACE T_CONSTANT_ENCAPSED_STRING 'Hello World' ';' T_CLOSE_TAG ?> Parser
Non-PHP statements Whats does the complier do with any non-PHP statements in the input script, e.g. HTML All such statements are complied into ECHO statements So at execution time the statements are just output asis <!-- example for PHP 5.0.0 final release --> <?php $domain = "localhost"; $user = "root";#note "MIKE" is unacceptable $password = ""; $conn = mysql_connect( $domain, $user, $password ); if($conn) { $msg = "Congratulations !!!! $user, You connected to MySQL"; } ?> <html> <head> <title>Connecting user</title> </head> <body> <h3> <?php echo( $msg ); ?> </h3> </body> </html>
Non-PHP statements line # op fetch ext operands ------------------------------------------------------------------------------- 3 0 ECHO '%3C%21--+example+for+ PHP+5.0.0+final+release+--%3E%0D%0A%0D%0A' 5 1 ASSIGN !0, 'localhost' 6 2 ASSIGN !1, 'root' 7 3 ASSIGN !2, '' 9 4 INIT_FCALL_BY_NAME 'mysql_connect' …….. snip …. 22 ADD_STRING ~5, ~5, 'MySQL' 23 ASSIGN !4, ~5 14 24 JMP ->25 26 25 ECHO '%0D%0A%3Chtml%3E%0D%0A %0D%0A+%3Chead%3E%0D%0A++%3Ctitle%3EConnecting+user%3C%2Ftitle%3E%0D%0A+%3C%2Fhead%3E%0D %0A%0D%0A+%3Cbody%3E%0D%0A++%3Ch3%3E+%0D%0A+++' 26 ECHO !4 30 27 ECHO '+%0D%0A++%3C%2Fh3%3E %0D%0A+%3C%2Fbody%3E%0D%0A%0D%0A%3C%2Fhtml%3E' 28 RETURN 1 29 ZEND_HANDLE_EXCEPTION
Opcodes Each Opcodes consists of: Opcode handler 1 or 2 input operands Optional result operand Optional “Extended value” Meaning opcode dependent, e.g on a ZEND_CAST it defines target type Line number in original source script Opcode. Range 0 - 151. All listed in zend_vm_opcodes.h Total size of each zend_op is 96 bytes Some operations consist of 2 opcodes e.g ZEND_ASSIGN_OBJ 2nd Opcode set to ZEND_OP_DATA struct _zend_op { opcode_handler_t handler; znode result; znode op1; znode op2; ulong extended_value; uint lineno; zend_uchar opcode; };
znode One for each operand and result each znode is 24 bytes Type can be as follows: IS_CONST (0x1) program literal IS_TMP_VAR (0x2) temporary variable with no name intermediate result IS_VAR (0x4) temporary variable with a name defined in symbol table IS_UNUSED (0x8) operand not specified IS_CV (0x10) optimized version of VAR For some opcodes type of znode is implied, e.g. for a JMP opcode’s op1 znode defines jump target address in “jmp_addr” EA defines “extended attributes” meanings opcode dependent e.g. on a ZEND_UNSET_VAR it defines if variable is static or not typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u; } znode;
zend_compile_file() Returns a pointer to zend_op_array for global scope first, it's not an array but a structure zend_op_array contains a pointer to an array of opcodes, plus much more including: pointer to array of complied variables details. More on these later. count of number of temporaries (TMP + VAR) required by opcodes i.e. the number of Zend Engine registers used pointer to hash table for all static's defined by the function the Hashtable is created and populated by the compiler if needed Compiler produces one zend_op_array for: global scope this is the one returned to caller of zend_compile_file and is saved in EG(active_op_array) each user function added to thread’s function table by compiler each user class method added to function table for class by compiler
zend_compile_file() Initial opcode array allocated by init_op_array() in zend_opcode.c allocated from heap sufficient for just 64 opcodes Reallocated each time it is full when by get_next_op() Reallocates new array 4 times current size Storage for opcode array freed by call to destroy_op_array() at request end For global scope called from zend_execute_scripts() For functions and methods called by Hash Table dtor routine. More later ZEND_OP ZEND_OP ZEND_OP ZEND_OP ZEND_OP struct _zend_op_array { ……. zend_uint *refcount; zend_op *opcodes; zend_uint last, size; zend_compiled_variable *vars; int last_var, size_var; zend_uint T;zend_brk_cont_element *brk_cont_array; zend_uint last_brk_cont; zend_uint current_brk_cont; zend_try_catch_element *try_catch_array; int last_try_catch; /* static variables support */ HashTable *static_variables; ….. e.t.c ;
Opcodes Not all opcode information can be determined as opcodes are generated by compiler, e.g. target address for a JMP opcode. So after all opcodes generated a 2nd pass is made over opcode array to fill in the missing information: set target for all jump opcodes during compilation jump targets are opcode array index’s. These are changed to absolute addresses set opcode handler to as defined by executor generated: CALL: address of handler function GOTO: address of label SWITCH: identifier (int) for handling CASE block for any operands (op1 or op2) which are CONSTANTS modify zval to is_ref=1, refcount=2 to ensure zval copied trim opcode array to required size; i.e. free unused storage See pass_two() in zend_opcode.c
Functions and Classes During MINIT 2 hash tables are built which the compiler uses GLOBAL_FUNCTION_TABLE Populated with names of all built-in functions, and functions defined by any enabled extensions GLOBAL_CLASS_TABLE Populated with default classes and any classes defined by enabled extensions The complier ADDs to both tables during compile step Any new entries are then removed again at request shutdown In a non-ZTS environment compiler updates the GLOBAL hash tables In a ZTS environment GLOBAL tables are read-only A separate r/w copy of each table is created for each new thread and populated from GLOBAL table in compiler_globals_ctor()
Function table’s One function table per thread Address stored in executor globals (EG) Function table is a Hashtable mapping function name to “zend_function” zend_function structure is itself a union of structure’s Populated with built in functions, extension functions e.t.c by copying GLOBAL_FUNCTIONS_TABLE built during MINIT in compiler_globals_ctor() type == ZEND_INTERNAL_FUNCTION zend_function == zend_internal_function During each request functions defined by user script are added to function table at compile time type == ZEND_USER_FUNCTION zend_function == zend_op_array typedef union _zend_function { zend_uchar type; struct { zend_uchar type; /* never used */ char *function_name; zend_class_entry *scope; ………. <SNIP > …………. zend_bool pass_rest_by_reference; unsigned char return_reference; } common; zend_op_array op_array; zend_internal_function internal_function; } zend_function;
State of play after compile complete GLOBALS _ENV HTTP_ENV_VARS ……. <internal func> <internal func> add5 sub5 ……. <?php function add5($a) { return $a + 5; } function sub5($a) { return $a - 5; } $a = add5(10); $b = sub5(15); ?> op_array for global scope zend_op_array opcodes symbol_table executor_globals active_oparray symbol_table active_symbol_table function_table class_table zend_internal_fucntion function_table zend_internal_fucntion zend_op_array zend_op_array
Function tables User entries removed from global function table during RSHUTDOWN processing by call to shutdown_executor() As user function entries are added after all internal functions the code uses the zend_hash_reverse_apply() function to traverse threads function table entries backwards removing entries until type != ZEND_USER_FUNCTION Removal triggers HT dtor routine ZEND_FUNCTION_DTOR which in turn calls destroy_op_array() to free opcode array and other structures which hang of zend_op_array
Class_table One class table per thread Address stored in executor globals (EG) Class table is a Hashtable mapping class name to “zend_class_entry” Populated with default classes and extension defined classes by copying GLOBAL_CLASS_TABLE built during MINIT in compiler_globals_ctor() During each request classes defined by user script are added to class table at compile time Each class has its own function table and compiler adds an entry for each method defined by a class
State of play after compile complete : GLOBALS _ENV HTTP_ENV_VARS ……. <internal method> <internal func> <internal method> <internal func> ……. ……. bark <internal class> sit Dog ……. ……. <?php class Dog { function bark() { print "Woof!"; } function sit() { print “Sit!!”; } } $pooch = new Dog; $pooch ->bark(); $pooch ->sit(); ?> zend_op_array opcodes symbol_table executor_globals active_oparray symbol_table active_smbol_table function_table class_table zend_internal_fucntion function_table function_table zend_internal_fucntion function_table class_table zend_op_array zend_op_array
Class table User class entries are removed by shutdown executor() by traversing threads class table backwards removing all entries until type != ZEND_USER_CLASS Removal triggers HT door routine ZEND_CLASS_DTOR which in turn calls destroy_zend_class() destroy_zend_class() calls zend_hash_destroy() on the class’s function_table which walks the HT and calls dtor ZEND_FUNCTION_DTOR on each entry as described earlier
Static variables Local scope but value retained across calls Hashtable allocated by compiler per function or method when first static variable defined Referenced by zend_op_array structure Statics added to Hashtable as found by compiler
Examining compile results Two tools available for analysing results of compile VLD Parsekit Both available from PECL
VLD Dumps opcodes for a given PHP script Written by Derick Rethans Download from PECL http://pecl.php.net/package/vld/0.8.0 Simple configuration --enable-vld[=shared] Invoked via command line switches php -dvld.active=[0|1] –dvld.execute=[0|1] –f <php script> Can override defaults in php.ini
VLD No config.w32 file for Windows ARG_ENABLE("vld", “Enable Vulcan Opcode decoder" , "no"); if (PHP_VLD != "no") { EXTENSION("vld", "vld.c srm_oparray.c"); }
VLD output <?php $a = 5; $b = 10; $c = $a + $b + 99; echo "c= $c \n"; ?> php -f test.php -dvld.active=1 KEY ! == compiler variable $ == variable ~ == temporary line # op fetch ext operands -------------------------------------------------------------------------- 2 0 ECHO '%0A' 4 1 ASSIGN !0, 5 5 2 ASSIGN !1, 10 6 3 ADD ~2, !0, !1 4 ADD ~3, ~2, 99 5 ASSIGN !2, ~3 8 6 INIT_STRING ~5 7 ADD_STRING ~5, ~5, 'c' 8 ADD_STRING ~5, ~5, '%3D+' 9 ADD_VAR ~5, ~5, !2 10 ADD_STRING ~5, ~5, '+' 11 ADD_CHAR ~5, ~5, 10 12 ECHO ~5 11 13 RETURN 1 14 ZEND_HANDLE_EXCEPTION c= 114 There are TMP’s defied for results here but they are not used and VLD does not list them
Why all these “+” in VLD output for CONST’s ? <?php echo "Hello World"; echo "Hello World"; echo "Hello + World"; ?> line # op fetch ext operands ------------------------------------------------------------------------------- 2 0 ECHO '%0D%0A+' 4 1 ECHO 'Hello+World' 5 2 ECHO 'Hello++++++++++++++++++++++++++++++++World' 6 3 ECHO 'Hello+%2B+World' 9 4 RETURN 1 5 ZEND_HANDLE_EXCEPTION Answer: VLD calls php_url_encode() on the CONST to format it before output which amongst other things converts all spaces to “+”. Internally white space is stored as 0x20 as you would expect.
parsekit PHP opcode analyser written by Sara Goleman meant for development and debug only; some code not thread safe Download from PECL http://pecl.php.net/package/parsekit Simple configuration --enable-session[=shared] Implements 5 functions parsekit_compile_string parsekit_compile_file parsekit_func_arginfo parsekit_opcode_flags parsekit_opcode_name
parsekit array parsekit_compile_string ( string phpcode [, array &errors [, int options]] ) compiles and then analyzes supplied string array parsekit_compile_string ( string phpcode [, array &errors [, int options]] ) errors: 2 dimensional array of errors encounterd during compile example of use in parsekit/examples options: either PARSEKIT_SIMPLE or PARSEKIT_QUIET PARSEKIT_QUIET results in more verbose output array parsekit_compile_file ( string filename [, array &errors [, int options]] ) As above but takes name of a .php file as input array parsekit-func-arginfo (mixed function) Return the arg_info data for a given user defined function/method long parsekit_opcode_flags (long opcode) Return flags which define return type, operand types etc for an opcode string parsekit_opcode_name (long opcode) Return name for given opcode
parsekit-compile-string: SIMPLE output array(5) { [0]=> string(36) "ZEND_ECHO UNUSED 'HelloWorld' UNUSED" [1]=> string(30) "ZEND_RETURN UNUSED NULL UNUSED" [2]=> string(42) "ZEND_HANDLE_EXCEPTION UNUSED UNUSED UNUSED" ["function_table"]=> NULL ["class_table"]=> NULL } <?php $oparray = parsekit_compile_string('echo "HelloWorld";', $errors, PARSEKIT_SIMPLE); var_dump($oparray); ?>
array(20) { ["type"]=> int(2) ["type_name"]=> string(18) "ZEND_USER_FUNCTION" ["fn_flags"]=> int(0) ["num_args"]=> int(0) ["required_num_args"]=> int(0) ["pass_rest_by_reference"]=> bool(false) ["uses_this"]=> bool(false) ["line_start"]=> int(0) ["line_end"]=> int(0) ["return_reference"]=> bool(false) ["refcount"]=> int(1) ["last"]=> int(3) ["size"]=> int(3) ["T"]=> int(0) ["last_brk_cont"]=> int(0) ["current_brk_cont"]=> int(-1) ["backpatch_count"]=> int(0) ["done_pass_two"]=> bool(true) ["filename"]=> string(49) "C:\Testcases\helloWorld.php" ["opcodes"]=> array(3) { [0]=> array(5) { ["opcode"]=> int(40) ["opcode_name"]=> string(9) "ZEND_ECHO" ["flags"]=> int(768) ["op1"]=> array(3) { ["type"]=> int(1) ["type_name"]=> string(8) "IS_CONST" ["constant"]=> &string(11) "Hello World" } ["lineno"]=> int(3) etc….. parsekit-compile-file: QUIET output <?php $oparray = parsekit_compile_string('echo "HelloWorld";', $errors, PARSEKIT_QUIET); var_dump($oparray); ?>
array(3) { [0]=> array(3) { ["name"]=> string(1) "a" ["allow_null"]=> bool(true) ["pass_by_reference"]=> bool(false) } [1]=> array(4) { ["name"]=> string(1) "b" ["class_name"]=> string(8) "stdClass" ["allow_null"]=> bool(false) ["pass_by_reference"]=> bool(false) } [2]=> array(3) { ["name"]=> string(1) "c" ["allow_null"]=> bool(true) ["pass_by_reference"]=> bool(true) } } parsekit-func-arginfo <? php function foo ($a, stdClass $b, &$c) { } $oparray = parsekit_func_arginfo (‘foo’); var_dump($oparray); ?>
parsekit-opcode-name <?php $opname = parsekit_opcode_name (61); var_dump($opname); ?> string(21) "ZEND_DO_FCALL_BY_NAME" <?php $opflags = parsekit_opcode_flags (61); var_dump($opflags); ?> int(16777218) flags define whether opcode takes op1 and op2, defines EA, sets a result etc
Execution path for a script php_execute_script() zend_execute_scripts() zend_compile_file() zend_execute() user call (function/method) include/require
PHP Interpreter Can be generated in many flavours 12 different versions possible Generated by a chunk of PHP code; zend_vm_gen.php You need to understand regular expressions before attempting to read this code Interpreter generated from definition of each opcode in zend_vm_def.h, and skeletal interpreter body in zend_vm-execute.skl
Interpreter generation process zend_vm-execute.h zend_vm_execute.skl zend_vm_gen.php zend_vm_def.h zend_vm_opcodes.h
zend_vm_execute.skl {%DEFINES%} ZEND_API void {%EXECUTOR_NAME%}(zend_op_array *op_array TSRMLS_DC) { zend_execute_data execute_data; {%HELPER_VARS%} {%INTERNAL_LABELS%} if (EG(exception)) { return; } /* Initialize execute_data */ EX(fbc) = NULL; EX(object) = NULL; EX(old_error_reporting) = NULL; if (op_array->T < TEMP_VAR_STACK_LIMIT) { EX(Ts) = (temp_variable *) do_alloca(sizeof(temp_variable) * op_array->T); } else { EX(Ts) = (temp_variable *) safe_emalloc(sizeof(temp_variable), op_array->T, 0); } …… etc triggers to zend_vmg_gen.php to insert generated code
zend_vm_defs.h types accepted for op1 types accepted for op2 opcode name opcode ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV) { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, GET_OP1_ZVAL_PTR(BP_VAR_R), GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE(); } helper function triggers to php code to replace text .. although this is just a macro!
Interpreter generation process Usage information: php zend_vm_gen.php [options] Options: --with-vm-kind=CALL|SWITCH|GOTO - select threading model (default is CALL) --without-specializer - disable executor specialization --with-old-executor - enable old executor --with-lines - enable #line directives –with-vm-kind defines execution method CALL: Each opcode handler is defined as a function SWITCH: Each opcode handler is a case block in one huge switch statement GOTO: Label defined for each opcode handler --without-specializer means only one handler per opcode With specializer’s a handler generated for each possible combination of operand types A reported 20% speedup with specializers enabled over old executor
Interpreter generation process --with-old-executor enables runtime decision to call old pre-ZE2 type executor which is a CALL type executor with no specializer’s zend_vm_use_old_executor() defined to switch executor model no current callers though --with-lines results in addition of #lines directives to generated zend_vm_execute.h #line 28 "C:\PHPDEV\php5.2-200612111130\Zend\zend_vm_def.h" static into ZEND_ADD_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS) { zend_op *opline = EX(opline); add_function(&EX_T(opline->result.u.var).tmp_var, …. etc • default interpreter which is checked into CVS is generated as follows • php zend_vm_gen.php –with-vm-kind=CALL
Specialization With specialization enabled an handler is generated for each valid combination of input operand As each input operand (op1 and op2) can take 1 of 5 types TMP VAR CV CONST UNUSED This gives a theoretical 25 opcode handlers for each opcode
zend_vm_defs.h ZEND_VM_HANDLER(1, ZEND_ADD, CONST|TMP|VAR|CV, CONST|TMP|VAR|CV) { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, GET_OP1_ZVAL_PTR(BP_VAR_R), GET_OP2_ZVAL_PTR(BP_VAR_R) TSRMLS_CC); FREE_OP1(); FREE_OP2(); ZEND_VM_NEXT_OPCODE(); }
ZEND_ADD without specialization static int ZEND_ADD_HANDLER(ZEND_OPCODE_HANDLER_ARGS) { zend_op *opline = EX(opline); zend_free_op free_op1, free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, get_zval_ptr(&opline->op1, EX(Ts), &free_op1, BP_VAR_R), get_zval_ptr(&opline->op2, EX(Ts), &free_op2, BP_VAR_R) TSRMLS_CC); FREE_OP(free_op1); FREE_OP(free_op2); ZEND_VM_NEXT_OPCODE(); } Handler calls non-type specific routines to get zval * for op1 and op2
ZEND_ADD with specialization static int ZEND_ADD_SPEC_CONST_CONST_HANDLER (ZEND_OPCODE_HANDLER_ARGS) { zend_op *opline = EX(opline); add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, &opline->op2.u.constant TSRMLS_CC); ZEND_VM_NEXT_OPCODE(); } static int ZEND_ADD_SPEC_CONST_VAR_HANDLER (ZEND_OPCODE_HANDLER_ARGS) { zend_op *opline = EX(opline); zend_free_op free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, _get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC) TSRMLS_CC); if (free_op2.var) {zval_ptr_dtor(&free_op2.var);}; ZEND_VM_NEXT_OPCODE(); } static int ZEND_ADD_SPEC_CONST_TMP_HANDLER (ZEND_OPCODE_HANDLER_ARGS) { zend_op *opline = EX(opline); zend_free_op free_op2; add_function(&EX_T(opline->result.u.var).tmp_var, &opline->op1.u.constant, _get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC) TSRMLS_CC); zval_dtor(free_op2.var); ZEND_VM_NEXT_OPCODE(); } …. and 13 other handlers Handlers call type specific routines to get zval * for op1 and op2
zend_vm_gen.php $op1_get_zval_ptr = array( "ANY" => "get_zval_ptr(&opline->op1, EX(Ts), &free_op1, \\1)", "TMP" => "_get_zval_ptr_tmp(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)", "VAR" => "_get_zval_ptr_var(&opline->op1, EX(Ts), &free_op1 TSRMLS_CC)", "CONST" => "&opline->op1.u.constant", "UNUSED" => "NULL", "CV" => "_get_zval_ptr_cv(&opline->op1, EX(Ts), \\1 TSRMLS_CC)", ); $op2_get_zval_ptr = array( "ANY" => "get_zval_ptr(&opline->op2, EX(Ts), &free_op2, \\1)", "TMP" => "_get_zval_ptr_tmp(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)", "VAR" => "_get_zval_ptr_var(&opline->op2, EX(Ts), &free_op2 TSRMLS_CC)", "CONST" => "&opline->op2.u.constant", "UNUSED" => "NULL", "CV" => "_get_zval_ptr_cv(&opline->op2, EX(Ts), \\1 TSRMLS_CC)", …..<snip> function gen_code(….) …… $code = preg_replace( array( ......... "/GET_OP1_ZVAL_PTR\(([^)]*)\)/", "/GET_OP2_ZVAL_PTR\(([^)]*)\)/", ........ ), array( ....... ....... $op1_get_zval_ptr[$op1], $op2_get_zval_ptr[$op2], ....... ), $code);
Generated code not always the best !! Input: zend_vm-def.h ZEND_VM_HANDLER(71, ZEND_INIT_ARRAY, CONST|TMP|VAR|UNUSED|CV, CONST|TMP|VAR|UNUSED|CV) { zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (OP1_TYPE == IS_UNUSED) { ZEND_VM_NEXT_OPCODE(); #if !defined(ZEND_VM_SPEC) || OP1_TYPE != IS_UNUSED } else { ZEND_VM_DISPATCH_TO_HANDLER(ZEND_ADD_ARRAY_ELEMENT); #endif } } Output: zend_vm-execute.h static int ZEND_INIT_ARRAY_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS) { zend_op *opline = EX(opline); array_init(&EX_T(opline->result.u.var).tmp_var); if (IS_CONST == IS_UNUSED) { ZEND_VM_NEXT_OPCODE(); #if 0 || IS_CONST != IS_UNUSED } else { return ZEND_ADD_ARRAY_ELEMENT_SPEC_CONST_CONST_HANDLER(ZEND_OPCODE_HANDLER_ARGS_PASSTHRU); #endif } }
Mapping opcode to an handler Generated zend_execute.h contains an array to map opcodes to handlers without specializers array has just 151 entries with specializers 3775 (151 * 25) entries zend_execute.c defines a function to enable compiler to determine correct handler for a given opcode zend_vm_set_opcode_handler(zend_op *op) Decodes type information for op1 and op2 in supplied “zend_op” and picks appropriate handler from array of handlers. Handler returned will be either: function pointer for handler when CALL id of handler routine for SWITCH address of handlers label for GOTO Mapping performed at compile time pass_two() of complier calls zend_vm_set_opcode_handle() to patch handler into all generated opcodes
zend_execute By default zend_execute function pointer addresses the generated execute() routine in zend_execute.h This is called by zend_execute_scripts() with : a pointer to the zend_op_array for global scope, and if ZTS enabled the tsrm_ls pointer Executor keeps state data for current user function in zend_execute_data structure which is allocated in execute() stack frame Address of currently executing functions zend_execute_data stored in EG struct _zend_execute_data { struct _zend_op *opline; zend_function_state function_state; zend_function *fbc; /* Function Being Called */ zend_op_array *op_array; zval *object; union _temp_variable *Ts; zval ***CVs; zend_bool original_in_execution; HashTable *symbol_table; struct _zend_execute_data *prev_execute_data; zval *old_error_reporting; };
execute() On entry acquire storage for Temporary variables Number of temporary variables used by function stored in “T” field of zend_op_array Storage allocated on stack if alloca() available and T < 2000 If alloca not available or 2000+ temporaries then allocated by emalloc from heap CV cache Number of compiled variables used stored in “last_var” field of zend_op_array Allocated on stack regardless of size if alloca available or emalloc otherwise Initialize zend_execute_data Initialize EX(opline) to address first opcode to execute EX(symbol_table) = EG(active_symbol_table) EX(prev_execute_data) = EG(current_execute_data); EG(current_execute_data) = &execute_data; executor_globals ……… current_execute_data …….... <?php function foo() { … } …… foo(); } global scope foo() zend_execute_data zend_execute_data null
Operand Types Operands Op1 and Op2 can be either: VAR ($) Temporary variable into which interpreter caches zval * and zval ** for a defined symbol. TMP (~) Temporary variable were interpreter keeps an intermediate result. For example $a = $b + $c, the sum of $b and $c will be stored in a TMP before being assigned to $a CV (!) Compiled variable. Optimized version of a VAR. More to follow shortly CONSTANT Program literal, e.g. $a = “hello” Symbols are also constants ZVAL allocated by complier ZVAL has is_ref=1 refcount=2 to force split on assignment UNUSED Operand not defined for opcode Result operand can be VAR, TMP or CV
Temporary Variables: VAR and TMP “Ts” field of zend_execute_data addresses an array of temp_variables Size of array based on information gathered by compiler. The “var” field in the operands znode contains the offset into the “temp_variables” array Temporaries are each 24 bytes T and EX_T macros provided to do this Temporary variables are NOT re-used by compiler typedef union _temp_variable { zval tmp_var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; } var; struct { zval **ptr_ptr; zval *ptr; zend_bool fcall_returned_reference; zval *str; zend_uint offset; } str_offset; zend_class_entry *class_entry; } temp_variable; typedef struct _znode { int op_type; union { zval constant; zend_uint var; zend_uint opline_num; zend_op_array *op_array; zend_op *jmp_addr; struct { zend_uint var; /*dummy */ zend_uint type; } EA; } u; } znode; struct _zend_execute_data { struct _zend_op *opline; ……. union _temp_variable *Ts; ….. etc };