Automatic Generation of Code-Centric Graphs for Understanding Shared-Memory Communication

Automatic Generation of Code-Centric Graphs for Understanding Shared-Memory Communication Dan Grossman University of Washington February 25, 2010

Joint work sampa.cs.washington.edu “safe multiprocessing architectures” Dan Grossman: Code-Centric View of Shared Memory

Key idea • Build a “communication graph” • Nodes: units of code (e.g., functions) • Directed edges: shared-memory communication • Source-node writes data that destination-node reads foo bar • This is code-centric, not data-centric • No indication of which addresses or how much data • No indication of locking protocol • Fundamentally complementary Dan Grossman: Code-Centric View of Shared Memory

Execution graphs First idea: a dynamic program-monitoring tool that outputs a graph • Run program with instrumentation (100x+ slowdown for now) • For write to address addr by thread T1 running function f: table[addr] := (T1,f) • For read of address addr by thread T2 running function g: if table[addr] == (T1,f) and T1 != T2 then include an edge from f to g • Note we can have f==g • Show graph to developers off-line • Program understanding • Concurrency metrics Dan Grossman: Code-Centric View of Shared Memory

Simple! “It’s mostly that easy” thanks to two modern technologies: • PIN dynamic binary instrumentation • Essential: Run real C/C++ apps without re-building/installing • Drawback: x86 only • Prefuse Visualization framework • Essential: Layout and navigation of large graphs • Drawback: Hard for reviewers to appreciate interactivity  But of course there’s a bit more to the story… Dan Grossman: Code-Centric View of Shared Memory

From an idea to a project • Kinds of graphs: 1 dynamic execution isn’t always what you want • Nodes: Function “inlining” is essential • Semantics: Why our graphs are “unsound” but “close enough” • Empirical Evaluation • Case studies: Useful graphs for real applications • Metrics: Using graphs to characterize program structure • Ongoing work Dan Grossman: Code-Centric View of Shared Memory

Graphs • Execution graph: Graph from one program execution • Testing graph: Union of graphs from runs over a test suite • Multiple runs can catch edges from nondeterminism • Program graph: Exactly edges from all possible interleavings with all possible inputs • Undecidable of course • Specification graph: Edges that the designer wishes to allow • Ongoing work: Concise and modular specifications Dan Grossman: Code-Centric View of Shared Memory

Inclusions Execution Graph Testing Graph Program Graph ⊆ ⊆ ⊂ ⊂ Tests don’t cover spec: Write more tests and/or restrict spec Spec doesn’t cover tests: Find bug and/or relax spec Spec. Graph Dan Grossman: Code-Centric View of Shared Memory

Diffs (Automated) Graph-difference is also valuable • Across runs with same input • Reveals communication non-determinism • Across runs with different inputs • Reveals dependence of communication on input • Versus precomputed testing graph • Reveals anomalies with respect to behavior already seen • … Dan Grossman: Code-Centric View of Shared Memory

A toy program (skeleton) queue q; // global, mutable void enqueue(T* obj) { … } T* dequeue() { … } void consumer(){ … T* t = dequeue(); … t->f … } void producer(){ … T* t = …; t->f=…; enqueue(t) … } Program: multiple threads call producer and consumer enqueue dequeue producer consumer Dan Grossman: Code-Centric View of Shared Memory

Multiple abstraction levels // use q as a task queue with // multiple enqueuers and dequeuers queue q; // global, mutable void enqueue(int i) { … } int dequeue() { … } void f1(){ … enqueue(i) … } void f2(){ … enqueue(j) … } void g1(){ … dequeue() … } void g2(){ … dequeue() … } enqueue dequeue Dan Grossman: Code-Centric View of Shared Memory

Multiple abstraction levels // use q1, q2, q3 to set up a pipeline queue q1, q2, q3; // global, mutable void enqueue(queue q, int i) { … } int dequeue(queue q) { … } void f1(){ … enqueue(q1,i) … } void f2(){ … dequeue(q1) … … enqueue(q2,j) … } void g1(){ … dequeue(q2) … … enqueue(q3,k) … } void g2(){ … dequeue(q3) … } enqueue dequeue Dan Grossman: Code-Centric View of Shared Memory

“Inlining” enqueue & dequeue // use q as a task queue with // multiple enqueuers and dequeuers queue q; // global, mutable void enqueue(int i) { … } int dequeue() { … } void f1(){ … enqueue(i) … } void f2(){ … enqueue(j) … } void g1(){ … dequeue() … } void g2(){ … dequeue() … } f1 g1 f2 g2 Dan Grossman: Code-Centric View of Shared Memory

“Inlining” enqueue & dequeue f1 // use q1, q2, q3 to set up a pipeline queue q1, q2, q3; // global, mutable void enqueue(queue q, int i) { … } int dequeue(queue q) { … } void f1(){ … enqueue(q1,i) … } void f2(){ … dequeue(q1) … … enqueue(q2,j) … } void g1(){ … dequeue(q2) … … enqueue(q3,k) … } void g2(){ … dequeue(q3) … } f2 g1 g2 Dan Grossman: Code-Centric View of Shared Memory

Inlining Moral • Different abstractions levels view communication differently • All layers are important to someone • Queue layer: pipeline stages communicate only via queues • Higher layer: stages actually form a pipeline • Current tool: Programmer specifies functions to inline • To control instrumentation overhead, must re-run • Ongoing work: maintaining full call-stacks efficiently • In our experience: Inline most “util” functions and DLL calls • Also “prune” from graph custom allocators and one-time initializations • Pruning is different from inlining; can be done offline Dan Grossman: Code-Centric View of Shared Memory

g h f Allowing memory-table races Thread 1, in fThread 2, in gThread 3, in h update(1,&f,&x); update(2,&g,&x); x = false; x = true; lookup(3,&h,x); if(x) { update(3,&h,&y); y=42; } lookup(1,&f,y); return y; Generated graph is impossible! But eachedge is possible! time Dan Grossman: Code-Centric View of Shared Memory

Soundness moral • Graph is correct in the absence of data races • Data races potentially break the atomicity of our instrumentation and the actual memory access • Hence wrong edge may be emitted • But a different schedule would emit that edge • We do not change program semantics (possible behaviors) • As we saw in the example, the set of edges produced may be wrong not just for that execution but for any execution • There are other ways to resolve these trade-offs Dan Grossman: Code-Centric View of Shared Memory

How big are the graphs? With “appropriate” inlining and pruning • By grad students unfamiliar with underlying apps • Raw numbers in similar ballpark (see the paper) Dan Grossman: Code-Centric View of Shared Memory

What about huge programs? Communication does not grow nearly as quickly as program size • Resulting graphs are big but interactive visualization makes them surprisingly interesting You really need to see the graphs… • Physics-based layout takes a minute or two (pre-done) Dan Grossman: Code-Centric View of Shared Memory

Node degree The graphs are very sparse • Even most edges have low degree Dan Grossman: Code-Centric View of Shared Memory

Changes across runs • Little graph difference with same inputs • 5 of 15 programs still “finding” new edges by fifth run • A way to measure “observed nondeterminism” • More difference for different inputs • Depends on application (0%-50% new edges) • Edges in the intersection of all inputs are an interesting subset Dan Grossman: Code-Centric View of Shared Memory

Ongoing work • Java tool rather than C/C++ and binary instrumentation • A real programming model for specifying allowed communication • Conciseness and modularity are the key points • Performance • Currently a heavyweight debugging tool (like valgrind) • Just performant enough to avoid time-outs! Dan Grossman: Code-Centric View of Shared Memory

Thanks! http://www.cs.washington.edu/homes/bpw/osha/ http://sampa.cs.washington.edu Dan Grossman: Code-Centric View of Shared Memory

Automatic Generation of Code-Centric Graphs for Understanding Shared-Memory Communication

Automatic Generation of Code-Centric Graphs for Understanding Shared-Memory Communication

Presentation Transcript

Shared Memory and Shared Memory Consistency

User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Automatic Code Generation for CSP# Models in PAT

User-Level Interprocess Communication for Shared Memory Multiprocessors

Automatic LQCD Code Generation

Shared Memory

Understanding Graphs

Automatic Generation and Analysis of Attack Graphs

Memops Data modelling and automatic code generation

AUTOMATIC GENERATION OF CODE OPTIMIZERS FROM FORMAL SPECIFICATIONS

Description Language of Calculation Scheme for Automatic Simulation Code Generation

Understanding Graphs

Shared Memory

Automatic Code Generation

User-Level Interprocess Communication for Shared Memory Multiprocessors

User-Level Interprocess Communication for Shared Memory Multiprocessors

Implementation of Shared Memory

Shared Memory – Consistency of Shared Variables