Specialization Tools and Techniques for Systematic Optimization of System Software

Specialization Tools and Techniques for Systematic Optimization of System Software Dylan McNamee, Jonathan Walpole, Calton Pu, Crispin Cowan, Charles Krasic, Ashvin Goel, Perry Wagle Presented by: Rizal Arryadi

Background • Much complexity in OS code arises from the requirement to handle all possible system states. • There are conflict between correctness across all applications vs high performance for individual applications • Micro-kernel approach to OS shows the conflict between performance, modularity, and portability

Approaches • Write general-purpose code, but optimized for a few anticipated common cases • Problem: “common” cases varies • Explicit Customization: incorporate customizability into system structure • SPIN, Exokernel, Synthesis, Mach, etc. • Problem: • Burden in system tuners • Limit access to global system state  optimization opportunities are reduced • Inferred Customization (Specialization): • Automatically derived optimization • Create optimized code for common cases • Restricting code, not extending it

Process Specialization • Also called Partial Evaluation • Consider a program P, taking 2 arguments S and D, producing a result R: run P(S,D) = R A specialization of P wrt S is as follows: run Ps(D) = run P(S,D) = R • Creating optimized code for common cases • Burden in system tuner is reduced • But, more complex analysis of system • Tedious, error-prone • Result is more complex, harder to debug & maintain

Objective of this paper • Provides toolkit to reduce manual work in specialization • Evaluate the toolkit’s effectiveness in operating system components

Fundamentals of Specialization • Specialization Predicates • States of the system known in advance • Partial Evaluation • Given specialization predicates, separate the static parts from the dynamic parts • Guards • Enable/Disable specialized code when specialized terms are modified

Three Kinds of Specialization • Static specialization • Predicates known at compile time • Partial Evaluation can be applied before execution • Dynamic specialization • Defer specialization until runtime • The values of spec predicates are not established until some point during execution • Once established, hold for the remainder of execution • Optimistic specialization • Spec predicates only hold for bounded time intervals (aka quasi-invariants)

Steps to specialize a system • Identify specialization predicates • Use developer’s knowledge • Locate code that can be optimized • Estimate the net performance improvement • Generate specialized code • Use partial evaluation to automate it • Check when specialization predicates hold • For dynamic and optimistic specialization • Locate all places that can cause predicates to change, and “guard” them • Replace specialized code • Replugging: enabling/disabling one version of specialized code with another • Not surprising: synchronization issue

Tempo: specialized code generator • Partial evaluator for C programs • Challenge in binding time analysis • Side effects in C • Pointers & aliases • Structures & arrays • Functions that modify global state • Ignored in conventional approaches but captured well by Tempo • Tempo features: • Use sensitivity: accurate treatment of “nonliftable values” • Flow sensitivity: variables could be static • Context sensitivity: assign specific binding time desc for each context • Return sensitivity: return value of side-effecting functions can be static • Static specialization • Compile-time and run-time specialization

Enabling/Disabling Specialized Code:TypeGuard • Place guards at the site of modifications to specialization predicate terms • To be efficient, the predicate terms must be used more frequently than modified. • Problems: • May include too many sites • Different variables, same name (locally defined) • Aliases  pass by reference • Spec Predicate terms often are not simple scalar, but fields of dynamically allocated structures.

Enabling/Disabling Specialized Code:TypeGuard • Type based static analysis • Two-phase approach: • Phase 1 • Statically identify structure types whose fields are spec predicate terms • Extend them to include spec predicate ID (SPID) • Identify statements that update a guarded field • Insert the guarding code • Phase 2 • Set SPID dynamically when specialized code is enabled • Clear it when specialized code is disabled • Check it when a spec predicate term is modified

Enabling/Disabling Specialized Code:TypeGuard • Guarding example: current->uid = bar would become: if (current.SPID!=NULL) current.SPID->update_uid(bar); else current->uid = bar;

Enabling/Disabling Specialized Code:MemGuard • TypeGuard issues warnings about alias-producing operations, and it needs to be validated. • MemGuard guarantees complete guard coverage • Uses memory protection HW to write-protect pages that contain spec predicate terms • The write-fault handler check if the address being written is a spec predicate term; if so, perform guarded write and might trigger replugging • Uses HW memory protection  guaranteed to capture all writes to spec predicate terms • Drawback: • Coarse granularity • High Overheads

Replugger • Replace current code with code that is consistent with the new state of specialization predicate. • Problem: concurrent replugging and function invocation • Solution: synchronization using locks • Factors affect the design: • Concurrent invocation of the same repluggable function • Can be avoided by associating functions with threads • Concurrency between replugging and invocation • With concurrent invocation (i.e. counting replugger): • Use counter to detect whether threads are exectuing the repluggable function • Use stub function holding_tank to avoid invocation while the function being replugged • Without concurrent invocation: • Use a boolean flag

Experiments:Specializing RPC • Applying Tempo to Sun RPC’s marshaling process • Static specialization the spec predicates are available when stubs are generated. • Specialization Opportunities: • Some fields in data structures used by marshaling code have values known at stub generation time • Some details: • Encoding/Decoding Dispatch Elimination • Buffer Overflow Checking Elimination • Exit Status Propagation • Marshaling Loop Unrolling

Experiments:Specializing RPC (Cont.) • Performance

Experiments:Specializing Packet Filters • BSD Packet Filter (BPF): • Interface for selecting packets from a network interface • Specialization Opportunities: • BPF is executed many times • To be specialized: the packet interpreter • “Specializing an interpreter wrt a particular program effectively compiles that program” • Either static and dynamic specialization • Static: packet filter program is available in advance • Dynamic: if the packet filter program is presented immediately before execution  overhead are included in run-time.

Experiments:Specializing Packet Filters (cont.) • A packet is read and filterd by calling: Parameters: packet filter program, a packet, the length of the original packet, the amount of data present • BPF interpreter: Recursion:

Experiments:Specializing Packet Filters (Cont.) • Performance • Code Size Unspecialized: 550 lines Specialized (6 instr filter): 366 lines Specialized (10 instr filter): 576 lines

Experiments:Specializing Signals • Statement ret = kill (pid, n) • Specialization opportunity: a process might repeatedly sends the same signal to the same destination process  both task_structs won’t change

Experiments:Specializing Signals (cont.) • Optimistic specialization: • if any predicate terms are modified between signals, the specialized code is invalid (e.g. the destination exits) • TypeGuard is used to identify locations that require guarding.

Performance Application Level Impact 100,000 Producer-Consumer iterations Buffer size: 4 Unspecialized: 11.9 s Specialized: 5.6 s Code Size 4 functions into 1, 59 LOC into 18 Experiments: Specializing Signals (Cont.)

Related Work • Multistage programming • Programs that generate other programs • Tempo is a: • Automatic • Heterogenous • Static and Run-time generation • Two or Three Stage system • Aspect-oriented programming “Different aspects of a system’s behavior tend to have their own natural form, so while one abstraction framework might do a good job of capturing one aspect, it will do a less good job capturing others”

Specialization Tools and Techniques for Systematic Optimization of System Software