Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

Rupesh Nasre. Indian Institute of Science, India. Jointly with: Dr. Kaushik Rajan, Prof. R. Govindarajan, Prof. Uday P. Khedker. Dec 14, 2009. Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis.

What is Pointer Analysis? Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer. a = &x; b = a; if (b == *p) { ... } else { ... } a points to x. We deal with C/C++. a and b are aliases. Alias analysis versus points-to analysis.

Normalized input. • address-of assignment: a = &x • copy assignment: a = b • load assignment: a = *p • store assignment: *p = a Our analysis isflow-insensitive and context-sensitive.

Context sensitivity. caller1() { caller2() { fun(int *ptr) { fun(&x); fun(&y); a = ptr; } } } intra-procedural: {(a, x), (a, y), (a, z), ...}. context-insensitive: {(a, x), (a, y)}. context-sensitive: {(a, x)} along main-...-caller1-fun, {(a, y)} along main-...-caller2-fun.

Context sensitivity. main() { f(a) { g(z) { S1: f(&x); S3: g(a); ... S2: f(&y); S4: g(b); ... } } } • Storage requirement increases exponentially. Along S1-S3-S5-S7, a points to {x1, x3, x5, x7}. Along S1-S3-S5-S8, a points to {x1, x3, x5, x8}. Along S1-S3-S6-S7, a points to {x1, x3, x6, x7}. Along S1-S3-S6-S8, a points to {x1, x3, x6, x8}. Along S1-S4-S5-S7, a points to {x1, x4, x5, x7}. Along S1-S4-S5-S8, a points to {x1, x4, x5, x8}. Along S1-S4-S6-S7, a points to {x1, x4, x6, x7}. Along S1-S4-S6-S8, a points to {x1, x4, x6, x8}. Along S2... Exponential blow-up of contexts. main S2 S1 f f S3 S4 S3 S4 g g g g Invocation graph.

Tackling scalability issues. How about exploiting commonality across contexts to store points-to information? e.g. BDDs (Berndl et al, PLDI 2003, Whaley et al, PLDI 2004). How about not storing complete contexts? e.g. k-cfa approach (O. Shivers, PhD Thesis, CMU, 1991), one level flow (M. Das, PLDI 2000). Can we have a probabilistic data structure that approximates the storage? Can we control the false-positive rate?

Observation. Points-to information is sparse. Average dereference size <<< number of address-taken variables. A few millions A few tens

Outline. • Introduction and Motivation. • Bloom filter. • Multi-dimensional bloom filter. • Performance. • Client analysis. This is the first work using bloom filter for program analysis.

Bloom Filter. A bloom filter is a probabilistic data structure for membership queries, and is typically implemented as a fixed-sized array of bits. Storing elements e1, e2, e3. e1, e3 e2 1 1 hash1 1 1 hash2 e1, e2 e3

Bloom Filter. N = size of bloom filter in bits, n = number of elements added, h = number of hash functions, P = false positive rate, P = e.g. for N=1,000,000, n=10,000, h=4, P=6.5%. for h=8, P=0.4%. Gives probabilistic guarantee on precision loss. h (1/2) (1 - nh/N)

Storing points-to information in bloom filter. points-to information: {(p, a), (a, x), (a, y), (q, a)}. hash(p, a) = 2 hash(a, x) = 0 hash(a, y) = 2 hash(q, a) = 5 Does p point to a? What all variables does p point to? 1 2 3 4 5 6 7 8 9 0 1 1 1 (a, x) (p, a) (a, y) (q, a)

Processing input. address-of assignment: a = &x copy assignment: a = b load assignment: a = *p store assignment: *p = a ?? ?? ??

A 2-D structure. hash1 1 p hash2 1 hash1 1 1 q hash2 1 1 hash1 r hash2 ...

A multi-dimensional structure. contexts 1 1 1 p 1 1 1 1 1 1 1 1 1 hash functions 1 1 q 1 1 pointers 1 1 1 1 1 1 1 1 r pointees We call it multi-dimensional bloom filter or simply multi-bloom.

Accessing multibloom. bloom[v] Earlier: hash(pointer, context, object) multiblom[p][c][x] Now: hash(pointer) hash(context) hash(object)

Multibloom. Multibloom is a 5-tuple: <P, C, H, B, M> P = number of entries for pointers. C = number of entries for contexts. H = number of hash functions. B = bit-vector size for each hash function. M = number of entries for multi-level pointers. You can play around with parameters keeping the total size of the bloom filter under control, with a probabilistic guarantee over precision loss.

Handling copy statement (a = b). c = hash(context); for each hash function i { for each bucket j { source = mb[b][c][i][j; destination = mb[a][c][i][j]; destination = destination bitwise-or source; } }

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 p2 p3 1 1 p4 1 1 1 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. p1 p2 p3 p4 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 p2 p3 p4 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 p3 p4 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 p4 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 p4 1 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 1 p4 1 p5

Example. • h(x) = 1, h(y) = 4, hs(p1) = 0, hs(p2) = 1. statement multibloom processing comment. p1 = &x p2 = &y p3 = &p1 p4 = &p2 p3 = p4 p5 = *p3 set bit 1 corresponding to x. set bit 4 corresponding to y. bitwise-OR p1's bucket. bitwise-OR p2's bucket. bitwise-OR corresponding buckets of p3 and p4. bitwise-OR p3's buckets, bitwise-OR with p5's bucket. 1 p1 1 p2 1 p3 1 p4 1 1 1 p5

Performance. • Benchmarks: SPEC 2000 C/C++, httpd, sendmail. • Framework: LLVM. • Platform: Intel Xeon, 2GHz clock, 4MB L2, 3GB RAM. • Precision: NoAlias percentage. NoAlias % is the percentage of queries that return NoAlias for all pairs of pointers in each function of the program. It need not be 100% for an exact analysis.

Multibloom. Multibloom is a 5-tuple: <P, C, H, B, M> P = number of entries for pointers. C = number of entries for contexts. H = number of hash functions. B = bit-vector size for each hash function. M = number of entries for multi-level pointers.

Multibloom parameters. Contexts (C) 1 1 1 p 1 1 1 1 1 1 1 hash functions (H) 1 1 1 1 q 1 1 Pointers (P) 1 1 1 1 1 1 1 1 r Pointees (B) Another dimension M for multi-level pointers.

Performance (vortex) (C-H-B) = (4-4-10) = (number of entries for contexts, number of hash functions, size of the bit-vector per hash function)

Experimental evaluation: Time(s).

Experimental evaluation: Memory (KB).

Experimental evaluation: Precision (NoAlias %). User has control over the trade-off between memory, analysis time and precision.

Client Analysis Mod/Ref analysis. Output: NoModRef, Ref, Mod, ModRef. Precision: NoModRef %.

Experimental evaluation: Mod/Ref.

Related work. • L. O. Andersen, Program analysis and specialization for the C programming language, PhD Thesis, DIKU, 1994. • B. Steensgaard, Points-to Analysis in Almost Linear Time, POPL 1996. • J. Whaley and M. S. Lam, Cloning-Based Context-Sensitive Pointer Alias Analysis Using Binary Decision Diagrams, PLDI 2004. • B. Hardekopf and C. Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007. • V. Kahlon, Bootstrapping: a technique for scalable flow and context-sensitive pointer alias analysis, PLDI 2008.

Take away. By using a multi-dimensional bloom filter, one can trade off precision, memory and time of an analysis to suit his needs, with a probabilistic guarantee on precision loss.

Rupesh Nasre. Indian Institute of Science, India. nasre@csa.iisc.ernet.in. Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. h1 p1 h2 h1 p2 h2 h1 p3 h2

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 h1 p1 h2 h1 Iteration 1. p2 h2 h1 p3 h2

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 h1 Iteration 1. p2 h2 h1 p3 h2

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 h1 Iteration 1. p2 1 h2 h1 p3 h2

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 h1 Iteration 1. p2 1 h2 1 h1 p3 h2 1

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 h1 Iteration 2. p2 1 h2 1 1 h1 p3 h2 1 1

Example. Program: p3 = p1, p2 = p3, p1 = &x, p2 = &y, p3 = p2. Let h1(x) = 0, h2(x) = 5, h1(y) = 3, h2(y) = 3. 0 1 2 3 4 5 6 7 8 9 10 11 1 h1 p1 1 h2 1 1 h1 Iteration 2. p2 1 1 h2 1 1 h1 p3 h2 1 1

Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

Scalable Context-sensitive Points-to Analysis using Multi-dimensional Bloom Filter.

Presentation Transcript

Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs

Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs

Context-sensitive Languages

Context Sensitive Solutions

Context-Sensitive Inter-procedural Points-to Analysis in the Presence of Function Pointers

Scalable Multi-Cache Simulation Using GPUs

Refinement-Based Context-Sensitive Points-To Analysis for JAVA

Context-Sensitive Pointer Analysis

Context Sensitive Solutions

Context-sensitive points-to analysis: is it worth it?

High Performance Pattern Matching using Bloom- Bloomier Filter

Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis

Context-sensitive ranking

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers

Multi-Dimensional Signal Analysis

Field-Sensitive Points-to-Analysis

Context Sensitive Points-to Analysis

Practical Object-sensitive Points-to Analysis for Java

Context-sensitive Analysis

Refinement-Based Context-Sensitive Points-To Analysis for Java

Caching Multi-dimensional Queries Using Chunks

Two Dimensional Filter Design Using Evolutionary Optimization