430 likes | 524 Views
Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010. Points-to Analysis as a System of Linear Equations. What is Pointer Analysis?. a points to x. a = &x; b = a; if (b == *p) { … } else { … }.
E N D
Rupesh Nasre. Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010 Points-to Analysis as a System of Linear Equations
What is Pointer Analysis? a points to x. • a = &x; • b = a; • if (b == *p) { • … • } else { • … • } a and b are aliases. Is this condition always satisfied? Pointer Analysis is a mechanism to statically find out run-time values of a pointer.
Why Pointer Analysis? • For Parallelization. • fun(p) || fun(q); • For Optimization. • a = p + 2; • b = q + 2; • For Bug-Finding. • For Program Understanding. • ... Clients of Pointer Analysis.
Placement of Pointer Analysis. Improved runtime. Parallelizing compiler. Lock synchronizer. Memory leak detector. Secure code. Pointer Analysis. Data flow analyzer. String vulnerability finder. Better compile time. Affine expression analyzer. Type analyzer. Program slicer. Better debugging.
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store
Normalized Input. p q • p = &q address-of • p = q copy • p = *q load • *p = q store
Normalized Input. p q • p = &q address-of • p = q copy • p = *q load • *p = q store
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store p q
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store p q
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store p q
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store p q
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store p q
Normalized Input. • p = &q address-of • p = q copy • p = *q load • *p = q store p q
Why as a Linear System? • Scalability. • Code sizes going into billions. • Scalability. • Analyses trade off at least one of • memory requirement, • analysis time, • precision. • Scalability. • Linear algebra is a mature topic.
Outline. • Introduction. • First-cut approach. • Prime-factorization approach. • Evaluation.
First-cut Approach: Transformations • p = &q p = q – 1 • p = q p = q • p = *q p = q + 1 • *p = q p + 1 = q Each address-taken variable (&v) would be assigned a unique value.
First-cut Approach. a = x - 1 p = a - 1 b = p + 1 c = b x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 • a = &x; • p = &a; • b = *p; • c = b; a points to x. Solve. Transform. Solve. a, b, c point to x. p points to a.
First-cut Approach. a = x - 1 p = a - 1 b = p + 1 c = b x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 • a = &x; • p = &a; • b = *p; • c = b; b points to x. Solve. Transform. Solve. a, b, c point to x. p points to a.
First-cut Approach. a = x - 1 p = a - 1 b = p + 1 c = b x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 • a = &x; • p = &a; • b = *p; • c = b; c points to x. Solve. Transform. Solve. a, b, c point to x. p points to a.
First-cut Approach. a = x - 1 p = a - 1 b = p + 1 c = b x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 • a = &x; • p = &a; • b = *p; • c = b; Solve. Transform. p points to a. Solve. a, b, c point to x. p points to a.
First-cut Approach. a = x - 1 p = a - 1 b = p + 1 c = b x = r a = r - 1 b = r - 1 c = r – 1 p = r - 2 a, b, c point to x. • a = &x; • p = &a; • b = *p; • c = b; p points to a. Solve. Transform. p points to b. Solve. p points to c. a, b, c point to x. Imprecise analysis.. p points to a.
Issues with First-cut Approach. Semantically different. • Dereferencing. • a = &x versus *a = x. a = &x *a = x a+1 = x a = x-1 Mathematically same.
Issues with First-cut Approach. • Dereferencing. • a = &x versus *a = x. • Multiple assignments. • a = &x, a = &y; a = &x; a = &y; a = x-1; a = y-1; Transform. Solve. No solution.
Issues with First-cut Approach. • Dereferencing. • a = &x versus *a = x. • Multiple assignments. • a = &x, a = &y; • Cyclic assignments. • a = &a; Transform. Solve. a = a-1 a = &a; No solution.
Issues with First-cut Approach. • Dereferencing. • a = &x versus *a = x. • Multiple assignments. • a = &x, a = &y; • Cyclic assignments. • a = &a; • Symmetry of assignment. • a = b implies b = a.
Outline. • Introduction. • First-cut approach. • Prime-factorization approach. • Evaluation.
Important Ideas. • Address of a variable as a prime number. • Points-to set as a multiplication of primes. • Variable renaming to avoid inconsistency.
Prime-factorization Approach: Transformations • p = &q pi * (p = prime(&q)) • p = q pi * (p = q) • p = *q pi * (p = q + 1) • *p = q handled separately Each address-taken variable (&v) would be assigned a unique prime number.
Points-to Information Lattice. 3*5*7*11*… 3*5*7 3*5*11 3*7*11 5*7*11… Precision increases 15 21 33 35 55 77… 3 5 7 11… 1 We start with larger primes to avoid composition gap problem.
Algorithm Outline. • do { • equations = Linearize(constraints); • solution = LinSolve(equations); • points-to = Interpret(solution); • constraints += AddConstraints(store-constraints, points-to); • } while points-to information changes;
Example. a = &x; p = &a; b = *p; c = b; a = a0*17 p = p0*101 b = b0*(p+1) c = c0*b a = 17 p = 101 b = 102 c = 102 a = 17 p = 101 b = 17 c = 17 Interpret. Solve. Transform. a0 = 1 b0 = 1 c0 = 1 p0 = 1 &x = 17 &a = 101 102 => 1 + 101 => 1 dereference on 101 => 1 dereference on &a => a => 17.
Solution Properties. • Integrality. • Only addition and multiplication over integers. • Feasibility. • No negative weight cycle. • Uniqueness. • Each variable is defined only once.
Soundness. • If &x = 7, &y = 11 and p points to x and y, then p is a multiple of 77. • Base: p points to x and y by direct assignment. • Induction: p points to x and y due to an indirect assignment (copy, load, store). • Prove that all indirect assignments are safe. • Argument: Multiplication moves the dataflow fact upwards in the lattice. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes.
Precision. • If &x = 7, &y = 11 and p is a multiple of 77, then p points to x and y. • Argument: Prime factorization is unique. • Thus, 77 can be decomposed only as 7*11. • Prove that none of the address-of, copy, load, store statements add extra primes into the composition. Assumption: No problem due to composition gaps. p1 + k1 is not misinterpreted as p2 + k2. The assumption can be enforced by careful offline selection of primes.
Properties. • If the value of a pointer p is a prime number, then it defines a must-point-to relation, else it is a may-point-to relation. • If the value of p is 1, then p is unused. • If pointers p1 and p2 have the same value, then p1 and p2 are pointer equivalent. • Variables x and y are location equivalent when &x dividing the value of pointer p implies &x*&y also divide the value. • Pointers p1 and p2 are aliases if gcd(p1, p2) != 1.
Outline. • Introduction. • First-cut approach. • Modified approach. • Evaluation.
Evaluation. Benchmarks:SPEC 2000, httpd, sendmail. Configuration: Intel Xeon, 2 Ghz clock, 4MB L2 cache, 3GB RAM. Analysis: Context-sensitive, Flow-insensitive.
Summary. • We proposed a novel representation of points-to information using prime factorization. • We solved pointer analysis as a system of linear equations. • We empirically showed that it is competitive to the state-of-the-art algorithms.
Rupesh Nasre. nasre@csa.iisc.ernet.in Computer Science and Automation Indian Institute of Science Advisor: Prof. R. Govindarajan Feb 22, 2010 Points-to Analysis as a System of Linear Equations
Our Contributions. • Ordering points-to statements in an intelligent way to improve the analysis time. • Dynamic partitioning of points-to statements for a prioritized points-to analysis. • Probabilistic points-to analysis using bloom filters. • Points-to analysis as a set of linear equations.
Normalized Input. p q p q • p = &q address-of • p = q copy • p = *q load • *p = q store p p q q p q p q p q p q