350 likes | 548 Views
HipHop Compiler for PHP. Transforming PHP into C++ . HipHop Compiler Team Facebook, Inc. May 2010. PHP is easy to read. <? php function tally($count ) { $sum = 0; for ($ i = 0; $ i < $count; ++$ i ) { $sum += $ i ; } return $sum; } print tally( 10 ) . “ n ”;.
E N D
HipHop Compiler for PHP Transforming PHP into C++ HipHop Compiler Team Facebook, Inc. May 2010
PHP is easy to read • <?php • function tally($count) { • $sum = 0; • for ($i = 0; $i < $count; ++$i) { • $sum += $i; • } • return $sum; • } • print tally(10) . “\n”;
PHP syntax is similar to C++/Java • <?php • class Tool extends Object { • public $name; • public use($target) {} • } • $tool = new Tool(); • $tool->name = ‘hammer’; • $tool->use($nail);
PHP Statements and Expressions • FunctionStatement, • ClassStatement, • InterfaceStatement, • ClassVariable, • ClassConstant, • MethodStatement, • StatementList, • BlockStatement, • IfBranchStatement, • IfStatement, • WhileStatement, • DoStatement, • ForStatement, • SwitchStatement, • CaseStatement, • BreakStatement, • ContinueStatement, • ReturnStatement, • GlobalStatement, • StaticStatement, • EchoStatement, • UnsetStatement, • ExpStatement, • ForEachStatement, • CatchStatement, • TryStatement, • ThrowStatement, • ExpressionList, • AssignmentExpression, • SimpleVariable, • DynamicVariable, • StaticMemberExpression, • ArrayElementExpression, • DynamicFunctionCall, • SimpleFunctionCall, • ScalarExpression, • ObjectPropertyExpression, • ObjectMethodExpression, • ListAssignment, • NewObjectExpression, • UnaryOpExpression, • IncludeExpression, • BinaryOpExpression, • QOpExpression, • ArrayPairExpression, • ClassConstantExpression, • ParameterExpression, • ModifierExpression, • ConstantExpression, • EncapsListExpression,
PHP is weakly typed • <?php • $a = 12345; • $a = “hello”; • $a = array(12345, “hello”, array()); • $a = new Object(); • $c = $a + $b; // integer or array • $c = $a . $b; // implicit casting to strings
Core PHP library is small • - Most are in functional style • - ~200 to 500 basic functions • <?php • $len = strlen(“hello”); // C library • $ret = curl_exec($curl); // open source
PHP is easy to debug • <?php • function tally($count) { • $sum = 0; • for ($i = 0; $i < $count; ++$i) { • $sum += $i; • var_dump($sum); • } • return $sum; • }
PHP is easy to learn • easy to read • easy to write • easy to debug • Hello, World!
Why is Zend Engine slow? • Byte-code interpreter • Dynamic symbol lookups • functions, variables, constants • class methods, properties, constants • Weakly typing • zval • array()
Transforming PHP into C++ • g++ is a native code compiler • static binding • functions, variables, constants • class methods, properties, constants • type inference • integers, strings, arrays, objects, variants • struct, vector, map, array
Static Binding – Function Calls • <?php • $ret = foo($a); • // C++ • Variant v_ret; • Variant v_a; • v_ret = f_foo(v_a);
Dynamic Function Calls • <?php • $func = ‘foo’; • $ret = $func($a); • // C++ • Variant v_ret; • Variant v_a; • String v_func; • V_func = “foo”; • v_ret = invoke(v_func, CREATE_VECTOR1(v_a));
Function Invoke Table • Variant invoke(CStrReffunc, CArrRefparams) { • int64 hash = hash_string(func); • switch (hash) { • case 1234: • if (func == “foo”) return foo(params[0]) • } • throw FatalError(“function not found”); • }
Re-declared Functions • <?php • if ($condition) { • function foo($a) { return $a + 1;} • } else { • function foo($a) { return $a + 2;} • } • $ret = foo($a); • // C++ • if (v_condition) { • g->i_foo = i_foo$$0; • } else { • g->i_foo = i_foo$$1; • } • g->i_foo(v_a);
Volatile Functions • <?php • if (!function_exists(‘foo’)) { • bar($a); • } else { • foo($a); • } • function foo($a) {} • // C++ • if (f_function_exists(“foo”)) { • f_bar(v_a); • } else { • f_foo(v_a); • } • g->declareFunction(“foo”);
Static Binding – Variables • <?php • $foo = ‘hello’; • function foo($a) { • global $foo; • $bar = $foo . $a; • return $bar; • } • // C++ • String f_foo(CStrRefv_a) { • Variant &gv_foo = g->GV(foo); • String v_bar; • v_bar = concat(toString(gv_foo), v_a); • return v_bar; • }
GlobalVariables Class • class GlobalVariables : public SystemGlobals { • public: • // Direct Global Variables • Variant gv_foo; • // Indirect Global Variables for large compilation • enum _gv_enums { • gv_foo, • } • Variant gv[1]; • };
Dynamic Variables • <?php • function foo() { • $b = 10; • $a = 'b'; • echo($$a); • } • void f_foo() { • class VariableTable: public RVariableTable { • public: • int64 &v_b; String &v_a; • VariableTable(int64 &r_b, String &r_a) : v_b(r_b), v_a(r_a) {} • virtual Variant getImpl(const char *s) { • // hash – switch – strcmp • } • } variableTable(v_b, v_a); • echo(variableTable.get("b”)); • }
Static Binding – Constants • <?php • define(‘FOO’, ‘hello’); • echo FOO; • // C++ • echo(“hello” /* FOO */);
Dynamic Constants • <?php • if ($condition) { • define(‘FOO’, ‘hello’); • } else { • define(‘FOO’, ‘world’); • } • echo FOO; • // C++ • if (v_condition) { • g->declareConstant("FOO", g->k_FOO, "hello”); • } else { • g->declareConstant("FOO", g->k_FOO, "world”); • } • echo(toString(g->k_FOO));
Static Binding with Classes • Class methods • Class properties • Class constants • Re-declared classes • Deriving from re-declared classes • Volatile classes
Summary - Dynamic Symbol Lookup Problem is nicely solved • Rule of 90-10 • Dynamic binding is a general form of static binding • Generated code is a super-set of static binding and dynamic binding
Problem 2. Weakly Typing • Type Inference • Runtime Type Info (RTTI)-Guided Optimization • Type Hints • Strongly Typed Collection Classes
Type Inference Example • <?php • $a = 10; • $a = ‘string’; • Variant v_a;
Why is strong type faster? • $a = $b + $c; • if (is_integer($b) && is_integer($c)) { • $a = (int)$b + (int)$c; • } else if (is_array($b) && is_array($c)) { • $a = array_merge((array)$b + (array)$c); • } else { • … • } • int64 v_a = v_b + v_c;
Type Inference Blockers • <?php • function foo() { • if ($success) return 10; // integer • return false; // doh’ • } • $arr[$a] = 10; // doh’ • ++$a; // $a can be a string actually! • $a = $a + 1; // $a can become a double, ouch!
RTTI-Guided Optimization • <?php • function foo($x) { • ... • } • foo(10); • foo(‘test’); • void foo(Variantx) { • ... • }
Type Specialization Method 1 • template<typename T> • void foo(Tx) { • // generate code with generic T (tough!) • } • -Pros: smaller generated code • -Cons: no type propagation
Type Specialization Method 2 • void foo(int64 x) { • // generate code assuming x is integer • } • void foo(Variantx) { • // generate code assuming x is variant • } • Pros: type propagation • Cons: variant case is not optimized
Type Specialization Method 3 • void foo(int64 x) { • // generate code assuming x is integer • } • void foo(Variantx) { • if (is_integer(x)) { • foo(x.toInt64()); return; • } • // generate code assuming x is variant • } • Pros: optimized for integer case • Cons: large code size
Type Hints • <?php • function foo(int$a) { • string $b; • } • class bar { • public array $c; • } • bar $d;
Strongly Typed Collection Classes • That omnipotent “array” in PHP • Swapping out underlying implementation: • Array escalation • PHP classes: • Vector • Set • Map: un-ordered • Then Array: ordered map
Compiler Friendly Scripting Language If all problems described here are considered when designing a new scripting language, will it run faster than Java?