260 likes | 431 Views
Automatic Generation of Code-Centric Graphs for Understanding Shared-Memory Communication. Dan Grossman University of Washington February 25, 2010. Joint work. sampa.cs.washington.edu. “safe multiprocessing architectures”. Key idea. Build a “communication graph”
E N D
Automatic Generation of Code-Centric Graphs for Understanding Shared-Memory Communication Dan Grossman University of Washington February 25, 2010
Joint work sampa.cs.washington.edu “safe multiprocessing architectures” Dan Grossman: Code-Centric View of Shared Memory
Key idea • Build a “communication graph” • Nodes: units of code (e.g., functions) • Directed edges: shared-memory communication • Source-node writes data that destination-node reads foo bar • This is code-centric, not data-centric • No indication of which addresses or how much data • No indication of locking protocol • Fundamentally complementary Dan Grossman: Code-Centric View of Shared Memory
Execution graphs First idea: a dynamic program-monitoring tool that outputs a graph • Run program with instrumentation (100x+ slowdown for now) • For write to address addr by thread T1 running function f: table[addr] := (T1,f) • For read of address addr by thread T2 running function g: if table[addr] == (T1,f) and T1 != T2 then include an edge from f to g • Note we can have f==g • Show graph to developers off-line • Program understanding • Concurrency metrics Dan Grossman: Code-Centric View of Shared Memory
Simple! “It’s mostly that easy” thanks to two modern technologies: • PIN dynamic binary instrumentation • Essential: Run real C/C++ apps without re-building/installing • Drawback: x86 only • Prefuse Visualization framework • Essential: Layout and navigation of large graphs • Drawback: Hard for reviewers to appreciate interactivity But of course there’s a bit more to the story… Dan Grossman: Code-Centric View of Shared Memory
From an idea to a project • Kinds of graphs: 1 dynamic execution isn’t always what you want • Nodes: Function “inlining” is essential • Semantics: Why our graphs are “unsound” but “close enough” • Empirical Evaluation • Case studies: Useful graphs for real applications • Metrics: Using graphs to characterize program structure • Ongoing work Dan Grossman: Code-Centric View of Shared Memory
Graphs • Execution graph: Graph from one program execution • Testing graph: Union of graphs from runs over a test suite • Multiple runs can catch edges from nondeterminism • Program graph: Exactly edges from all possible interleavings with all possible inputs • Undecidable of course • Specification graph: Edges that the designer wishes to allow • Ongoing work: Concise and modular specifications Dan Grossman: Code-Centric View of Shared Memory
Inclusions Execution Graph Testing Graph Program Graph ⊆ ⊆ ⊂ ⊂ Tests don’t cover spec: Write more tests and/or restrict spec Spec doesn’t cover tests: Find bug and/or relax spec Spec. Graph Dan Grossman: Code-Centric View of Shared Memory
Diffs (Automated) Graph-difference is also valuable • Across runs with same input • Reveals communication non-determinism • Across runs with different inputs • Reveals dependence of communication on input • Versus precomputed testing graph • Reveals anomalies with respect to behavior already seen • … Dan Grossman: Code-Centric View of Shared Memory
From an idea to a project • Kinds of graphs: 1 dynamic execution isn’t always what you want • Nodes: Function “inlining” is essential • Semantics: Why our graphs are “unsound” but “close enough” • Empirical Evaluation • Case studies: Useful graphs for real applications • Metrics: Using graphs to characterize program structure • Ongoing work Dan Grossman: Code-Centric View of Shared Memory
A toy program (skeleton) queue q; // global, mutable void enqueue(T* obj) { … } T* dequeue() { … } void consumer(){ … T* t = dequeue(); … t->f … } void producer(){ … T* t = …; t->f=…; enqueue(t) … } Program: multiple threads call producer and consumer enqueue dequeue producer consumer Dan Grossman: Code-Centric View of Shared Memory
Multiple abstraction levels // use q as a task queue with // multiple enqueuers and dequeuers queue q; // global, mutable void enqueue(int i) { … } int dequeue() { … } void f1(){ … enqueue(i) … } void f2(){ … enqueue(j) … } void g1(){ … dequeue() … } void g2(){ … dequeue() … } enqueue dequeue Dan Grossman: Code-Centric View of Shared Memory
Multiple abstraction levels // use q1, q2, q3 to set up a pipeline queue q1, q2, q3; // global, mutable void enqueue(queue q, int i) { … } int dequeue(queue q) { … } void f1(){ … enqueue(q1,i) … } void f2(){ … dequeue(q1) … … enqueue(q2,j) … } void g1(){ … dequeue(q2) … … enqueue(q3,k) … } void g2(){ … dequeue(q3) … } enqueue dequeue Dan Grossman: Code-Centric View of Shared Memory
“Inlining” enqueue & dequeue // use q as a task queue with // multiple enqueuers and dequeuers queue q; // global, mutable void enqueue(int i) { … } int dequeue() { … } void f1(){ … enqueue(i) … } void f2(){ … enqueue(j) … } void g1(){ … dequeue() … } void g2(){ … dequeue() … } f1 g1 f2 g2 Dan Grossman: Code-Centric View of Shared Memory
“Inlining” enqueue & dequeue f1 // use q1, q2, q3 to set up a pipeline queue q1, q2, q3; // global, mutable void enqueue(queue q, int i) { … } int dequeue(queue q) { … } void f1(){ … enqueue(q1,i) … } void f2(){ … dequeue(q1) … … enqueue(q2,j) … } void g1(){ … dequeue(q2) … … enqueue(q3,k) … } void g2(){ … dequeue(q3) … } f2 g1 g2 Dan Grossman: Code-Centric View of Shared Memory
Inlining Moral • Different abstractions levels view communication differently • All layers are important to someone • Queue layer: pipeline stages communicate only via queues • Higher layer: stages actually form a pipeline • Current tool: Programmer specifies functions to inline • To control instrumentation overhead, must re-run • Ongoing work: maintaining full call-stacks efficiently • In our experience: Inline most “util” functions and DLL calls • Also “prune” from graph custom allocators and one-time initializations • Pruning is different from inlining; can be done offline Dan Grossman: Code-Centric View of Shared Memory
From an idea to a project • Kinds of graphs: 1 dynamic execution isn’t always what you want • Nodes: Function “inlining” is essential • Semantics: Why our graphs are “unsound” but “close enough” • Empirical Evaluation • Case studies: Useful graphs for real applications • Metrics: Using graphs to characterize program structure • Ongoing work Dan Grossman: Code-Centric View of Shared Memory
g h f Allowing memory-table races Thread 1, in fThread 2, in gThread 3, in h update(1,&f,&x); update(2,&g,&x); x = false; x = true; lookup(3,&h,x); if(x) { update(3,&h,&y); y=42; } lookup(1,&f,y); return y; Generated graph is impossible! But eachedge is possible! time Dan Grossman: Code-Centric View of Shared Memory
Soundness moral • Graph is correct in the absence of data races • Data races potentially break the atomicity of our instrumentation and the actual memory access • Hence wrong edge may be emitted • But a different schedule would emit that edge • We do not change program semantics (possible behaviors) • As we saw in the example, the set of edges produced may be wrong not just for that execution but for any execution • There are other ways to resolve these trade-offs Dan Grossman: Code-Centric View of Shared Memory
From an idea to a project • Kinds of graphs: 1 dynamic execution isn’t always what you want • Nodes: Function “inlining” is essential • Semantics: Why our graphs are “unsound” but “close enough” • Empirical Evaluation • Case studies: Useful graphs for real applications • Metrics: Using graphs to characterize program structure • Ongoing work Dan Grossman: Code-Centric View of Shared Memory
How big are the graphs? With “appropriate” inlining and pruning • By grad students unfamiliar with underlying apps • Raw numbers in similar ballpark (see the paper) Dan Grossman: Code-Centric View of Shared Memory
What about huge programs? Communication does not grow nearly as quickly as program size • Resulting graphs are big but interactive visualization makes them surprisingly interesting You really need to see the graphs… • Physics-based layout takes a minute or two (pre-done) Dan Grossman: Code-Centric View of Shared Memory
Node degree The graphs are very sparse • Even most edges have low degree Dan Grossman: Code-Centric View of Shared Memory
Changes across runs • Little graph difference with same inputs • 5 of 15 programs still “finding” new edges by fifth run • A way to measure “observed nondeterminism” • More difference for different inputs • Depends on application (0%-50% new edges) • Edges in the intersection of all inputs are an interesting subset Dan Grossman: Code-Centric View of Shared Memory
Ongoing work • Java tool rather than C/C++ and binary instrumentation • A real programming model for specifying allowed communication • Conciseness and modularity are the key points • Performance • Currently a heavyweight debugging tool (like valgrind) • Just performant enough to avoid time-outs! Dan Grossman: Code-Centric View of Shared Memory
Thanks! http://www.cs.washington.edu/homes/bpw/osha/ http://sampa.cs.washington.edu Dan Grossman: Code-Centric View of Shared Memory