Memory-manager/Scheduler co-design: Optimising event-driven servers

Memory-manager/Scheduler co-design: Optimising event-driven servers Sapan Bhatia (INRIA) [speaker] Charles Consel (INRIA) Julia Lawall (DIKU)

Outline • Problem: • Memory barrier • Effect on highly-concurrent servers • Our approach • Allocation strategy • Scheduling algorithm • Implementation • Event-driven servers • Program analysis tools • Conclusion & future work

Problem • Memory barrier • Memory accesses… $$$ • 1-2 orders of magnitude > CPU cycles • Highly concurrent programs impacted in particular • Eg. A server treating 100s of requests at once

Cache behaviour under concurrency (1)

The Stingy Allocator • Goal: • Control the placement of objects in the cache • Ensure that data in the server does not overflow from the cache • Why (1)? • Why (2)?

Controlling the placement of objects • Virtual memory mapping that maps into the cache Memory Cache

Staying within the cache O1 O2 O4 O3/ O4 O5 1. Allocator-oriented solution 2. Scheduler-oriented solution

Configuring the allocator • Constraints: • Ol2 L size(Ol) ¢ nOl + Oa2 A size(Oa) · • Ai dom Aj and Aj dom Di) nOi¸ nOj • Objective function (I-cache): • Mw(N) = s 2 Sws¢ N / minOl2 Ls nO

Application to existing programs • We use event-driven programs • Standard for implementing high-performance servers • The scheduler is implemented in the application • Utilities for implementing the allocator • Memwalk, Stingygen, Stingify • Last step: modify scheduling algorithm (555) M.C.G.Y.V.E.R

Event-driven programs • Features: • Ordonnanceur • Tâches • Evennements

Stages in the optimization • Annotate the elements of the program: scheduler, stages… • Analyze memory behaviour using Memwalk • Generate an allocator specific to the server using Stingygen • Modify invocations to the allocator using Stingify • Modify the scheduler

Memwalk

Stingygen Output of memwalk Stingygen “Memory Map” “Memory Map” “Customized allocator” + = Stingylib

Stingify char *hdr_string,*route_string;hdr_string = alloc(max_hdr_len); route_string = alloc(max_route_len); char *hdr_string,*route_string;hdr_string = stingy_alloc(1); route_string = stingy_alloc(2);

Changing the scheduler • Done manually (for now) • Our scheduling policy: • Add the following to the selection criterion: if (stingy_query(<stage>)==NO_MEM) then dont_select() • New priorities: • Highest: full throttle (stingy_query returns FULL_MEM) • Second-highest: higher up is better

Results for TUX Data-cache misses: -75% Throughput: +40%

Conclusion • Cache problems are pronounced in concurrent servers • Our approach uses scheduling + memory-management • Our approach is applied to event-driven programs • Various program-analysis tools to apply the approach

Future work • Port to multi-processed programs • Facilitate the modification of the scheduler • Push the notion of cache reservation deeper into the OS

Thank you! Questions?

Thank you once again This was joint work with Charles Consel and Julia Lawall

Memory-manager/Scheduler co-design: Optimising event-driven servers