190 likes | 460 Views
Calculating Stack Distances Efficiently. George Almasi,Calin Cascaval,David Padua {galmasi,cascaval,padua}@cs.uiuc.edu. This talk is about: Algorithms to calculate stack distance histograms Speed/memory optimization of trace analysis to create stack distance histogram.
E N D
Calculating Stack Distances Efficiently George Almasi,Calin Cascaval,David Padua {galmasi,cascaval,padua}@cs.uiuc.edu
This talk is about: Algorithms to calculate stack distance histograms Speed/memory optimization of trace analysis to create stack distance histogram This talk is not about: why stack distance histograms are/are not useful relative merits of inter-reference distance vs. stack distance speed/memory optimization of applications What this talk is, and is not, about
Two measures of locality • Inter-reference distance: • the number of other references between two references to the same address in the trace • Stack distance: • The number of distinct addresses referred between two references to the same address Inter-ref distance = 7 stack distance = 4 a b c d b c d e a
C hits(C) = s() =1 Inf misses(C) = s() =C+1 Stack Distances As Cache Misses • compute the number of cache hits and misses as follows:
Inter-reference distance • Given that at time t ref(t)=x • find t0, time of last previous reference to x • inter reference distance: • Efficient implementation: a (hash)table H(x) = t0, the trace index of the last reference to x; Memory usage ~ 2x original program Cost O(1) per reference
Stack distance 1 3 Depth(x) a h x x b b a h h x b c a a h d c b b a e d c c c e f d d d f e e ... e f f ... f x ... ... ... y x y z y y y u z z z z v u u u u v v v v
Stack distance • Simulates an infinite cache with LRU replacement policy • nice properties (inclusion!) • naïve implementation: stack as linked list/array • m = 250,000 average maximum stack depth • list traversal/array updates; O(m) per trace element
Insight: stack is contained in trace Time Trace a b b g g e d f z f c e b c d a g Time=t Stack g z f e b c d a g Stack top
Holes • Index tx in the trace is a hole if ref(tx) has already been referenced again at a later time ty < t. • Using holes, we can say • stackdist(t) = refdist(t) - #holes(t0 to t) • How many holes are there between t0 and t?
An interval tree of holes t t0 ... • • • o o o o a o o o a ref to a Prev. ref to a k:k k+4:k+5 k+2:k+3 • Single tree operation: count_and_add (t0) • Determines # of holes between t0 and t; adds a new hole at t0 • Adding a hole can create a new interval - or fuse two existing ones
Operations on the interval tree Add to interval edge: count_and_add(p) p=n+1 Create new interval: count_and_add(p) p > n+1 Join two intervals: count_and_add(p) p = n+1 k:n k:n k:n n+2:p k:n+1 k:n+1 k:p p:p
basics: tree is pre-allocated binary, balanced each node contains a number: the number of holes in its right subtree memory used by node depends on node’s depth a modified version of the B&K algorithm: holes instead of references binary instead of n-ary better memory usage Pre-allocated hole trees
Pre-allocated hole trees a b b g e d f z f c e b c d a 1 0 1 0 1 0 0 0 1 1 0 0 3 0 1 n n n=n+1 count += n
Q: Why holes and not stack elements? A: Holes need 1/2 the maintenance of stack elements. Q: Will the interval tree grow to ? A: No. Intervals fuse together spontaneously. Q: How big will the tree be? A: #of intervals = O(stack depth) Depth of a tree of stack elements would be the same size Q: Will the tree be unbalanced? A: Yes, because it tends to grow on one side. Many Questions
Q: what kind of interval tree? A: RB and AVL Q: Which is better? A: AVL is better. Q: Why? A: shorter average tree height: h+1 vs. 2h not all operations change the tree structure More questions
Interval trees: exec time O(log(m)) memory usage O(m) AVL better than RB pointer chasing, bad locality Pre-allocated trees: exec time O(log(n)) memory usage O(n) hits practical limit holes are better reduced maintenance no pointer chasing, good locality Comparisons
Conclusions • Stack distances with holes: • using RB/AVL interval trees • using pre-allocated trees • Using holes reduces linear overhead by 20-40% for both kinds of algorithms.