Increasing complexities

Compiler optimizationsbased on call-graphflatteningCarlo Alberto Ferrarisprofessor Silvano RivoiraMaster of Science in Telecommunication EngineeringThird School of Engineering: Information TechnologyPolitecnico di TorinoJuly 6th, 2011

Increasingcomplexities Everydayobjects are becomingmulti-purposenetworkedinteroperablecustomizablereusableupgradeable

Increasingcomplexities Everydayobjects are becomingmore and more complex

Increasingcomplexities Software thatrunssmartobjectsisbecomingmore and more complex

Diminishingresources Systemshavetoberesource-efficient

Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours

Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power Especiallyvaluable in battery-poweredscenariossuchas mobile, sensor, 3rd world applications

Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power, density Criticalfactor in data-center and product design

Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power, density, computational CPU, RAM, storage, etc. are oftengrowingslowerthan the potentialapplications

Diminishingresources Systemshavetoberesource-efficient Resources come in manydifferentflavours Power, density, computational, development Developmenttime and costsshouldbeas low aspossiblefor low TTM and profitability

Diminishingresources Systemshavetoberesource-efficient Resources come in manynon-orthogonalflavours Power, density, computational, development

Do more withless

Abstractions Weneedtomodularize and hide the complexity Operatingsystems, frameworks, libraries, managedlanguages, virtualmachines, …

Abstractions Weneedtomodularize and hide the complexity Operatingsystems, frameworks, libraries, managedlanguages, virtualmachines, … Allofthiscomeswith a cost: genericsolutions are generallylessefficientthanad-hocones

Abstractions Weneedtomodularize and hide the complexity Palm webOS User interface running on HTML+CSS+Javascript

Abstractions Weneedtomodularize and hide the complexity Javascript PC emulator Running Linux inside a browser

Optimizations Weneedtomodularize and hide the complexitywithoutsacrificing performance

Optimizations Weneedtomodularize and hide the complexitywithoutsacrificing performance Compiler optimizationstrade off compilation timewithdevelopment, executiontime

Vestigialabstractions The naturalsubdivisionof code in functionsismaintained in the compiler and all the way down to the processor Eachfunctionisself-containedwithstrictconventionsregulatinghowitrelatestootherfunctions

Vestigialabstractions Processors don’t care aboutfunctions; respecting the conventionsis just additional work Push the contentsof the registers and returnaddress on the stack, jumpto the callee; execute the callee, jumpto the returnaddress; restore the registersfrom the stack

Vestigialabstractions Manyoptimizations are simplynotfeasiblewhenfunctions are present int replace(int* ptr, int value) { inttmp = *ptr; *ptr = value; return tmp; } int A(int*ptr, intvalue) { returnreplace(ptr, value); } int B(int*ptr, intvalue) { replace(ptr, value); returnvalue; } void*malloc(size_tsize) { void*ret; // [variouschecks] ret = imalloc(size); if (ret == NULL) errno = ENOMEM; returnret; } // ... type *ptr = malloc(size); if (ptr == NULL) return NOT_ENOUGH_MEMORY; // ...

Vestigialabstractions Manyoptimizations are simplynotfeasiblewhenfunctions are present interpreter_setup(); while (opcode = get_next_instruction()) interpreter_step(opcode); interpreter_shutdown(); function interpreter_step(opcode) { switch (opcode) { case opcode_instruction_A: execute_instruction_A(); break; case opcode_instruction_B: execute_instruction_B(); break; // ... default: abort("illegal opcode!"); } }

Vestigialabstractions Manyoptimizationefforts are directed at workingaround the overheadcausedbyfunctions Inliningclones the body of the callee in the caller; optimalsolutionw.r.t.callingoverheadbutcauses code sizeincrease and cache pollution; usefulonly on small, hot functions

Call-graphflattening

Call-graphflattening Whatifwedismissfunctionsduringearlycompilation…

Call-graphflattening Whatifwedismissfunctionsduringearly compilation and track the control flow explicitelyinstead?

Call-graphflattening Wegetmostbenefitsofinlining, including the abilitytoperformcontextual code optimizations, without the code sizeissues

Call-graphflattening Wegetmostbenefitsofinlining, including the abilitytoperformcontextual code optimizations, without the code sizeissues Where’s the catch?

Call-graphflattening The load on the compiler increasesgreatlybothdirectly due to CGF itself and alsoindirectly due tosubsequentoptimizations Worse case complexity (numberofedges) isquadraticw.r.t. the numberofcallsitesbeingtransformed (heuristicsmay help)

Call-graphflattening During CGF weneedtostaticallykeeptrackofall live valuesacrossallcallsites in allfunctions A valueisaliveifitwillbeneeded in subsequentinstructions A = 5, B = 9, C = 0; // live: A, B C = sqrt(B); // live: A, C return A + C;

Call-graphflattening Basically the compiler hastostatically emulate ahead-of-timeall the possiblestackusagesof the program Thishasalreadybeendone on microcontrollers and resulted in a 23% decreaseofstackusage (and 5% performance increase)

Call-graphflattening The indirect cause ofincreased compiler loadcomesfrom standard optimizationsthat are runafter CGF CGF doesnot create newbranches (eachcall and returninstructionisturnedexactelyinto a jump) butotheroptimizations can

Call-graphflattening The indirect cause ofincreased compiler loadcomesfrom standard optimizationsthat are runafter CGF Mostoptimizations are designedto operate on smallfunctionswithlimitedamountsofbranches

Call-graphflattening Manypossibleapplicationscenariosbesideinlining

Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion Moveinstructionsbetweenfunctionboundaries; avoidunneededcomputations, alleviate registerpressure, improve cache locality

Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression Findsimilar code sequences in differentpartsof the code and mergethem; reduce code size and cache pollution

Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression, nonlinear CF CGF supportsnativelynonlinearcontrolflows; almost-zero-cost EH and coroutines

Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression, nonlinear CF, stacklessexecution No runtimestackneeded in fully-flattenedprograms

Call-graphflattening Manypossibleapplicationscenariosbesideinlining Code motion, macro compression, nonlinear CF, stacklessexecution, stackprotection Effectivestackpoisoningattacks are muchharder or evenimpossible

Implementation To test if CGF isapplicablealsotocomplexarchitectures and to validate some of the ideaspresented in the thesis, a pilotimplementationwaswrittenagainst the open-source LLVM compiler framework

Implementation Operates on LLVM-IR; host and target architectureagnostic; roughly 800 linesof C++ code in 4 classes The pilotimplementation can notflattenrecursive, indirect or variadiccallsites; they can beusedanyway

Implementation Enumerate suitablefunctions Enumerate suitablecallsites (and their live values) Create dispatchfunction, populatewith code Transformcallsites Propagate live values Removeoriginalfunctions or create wrappers

Examples int a(int n) { return n+1; } int b(int n) { inti; for (i=0; i<10000; i++) n = a(n); return n; }

int a(int n) { return n+1; } int b(int n) { inti; for (i=0; i<10000; i++) n = a(n); return n; }

Examples int a(int n) { return n+1; } int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n; }

int a(int n) { return n+1; } int b(int n) { n = a(n); n = a(n); n = a(n); n = a(n); return n; }

.type .Ldispatch,@function .Ldispatch: movl $.Ltmp4, %eax # store the return dispather of a in rax jmpq *%rdi # jump to the requested outer disp. .Ltmp2: # outer dispatcher of b movl $.LBB2_4, %eax # store the address of %10 .Ltmp0: # outer dispatcher of a movl (%rsi), %ecx # load the argument n in ecx jmp .LBB2_4 .Ltmp8: # block %17 movl $.Ltmp6, %eax jmp .LBB2_4 .Ltmp6: # block %18 movl $.Ltmp7, %eax .LBB2_4: # block %10 movq %rax, %rsi incl %ecx # n = n + 1 movl $.Ltmp8, %eax jmpq *%rsi # indirectbr .Ltmp4: # return dispatcher of a movl %ecx, (%rdx) # store in pointer rdx the return value ret # in ecx and return to the wrapper .Ltmp7: # return dispatcher of b movl %ecx, (%rdx) ret

Increasing complexities

Increasing complexities

Presentation Transcript

REGULATORY ENVIRONMENT: INCREASING COMPLEXITIES AND CHALLENGES

Conundrums and Complexities

High Density Lipoprotein Complexities

Complexities of Liberalism in Practice

Neurodevelopmental Disorders Complexities Dilemmas

THE COMPLEXITIES OF BILINGUALISM

Increasing Internationalisation-Increasing Diversity

Fuzzing with complexities

Increasing

Complexities of the Rate Change

Complexities of human rights

How to Determine Complexities

Behavior Modification II: ABC Complexities

Complexities of Long-Term Potentiation

Observing Complexities of Reionization

Two complexities and their models

Complexities!

Complexities of Compliance

Increasing

THE BIM MODELLING COMPLEXITIES

Complexities in legal translation

Wisdom Tooth Removal Complexities