1 / 19

Calculating Stack Distances Efficiently

This talk discusses algorithms for stack distance histograms, speed/memory optimization for trace analysis, and cache misses as stack distances. It explains the implementation of stack distance computation, interval tree operations for holes, and the benefits of using pre-allocated trees. The talk elaborates on the efficiency gains achieved by incorporating holes into stack distance calculations using RB/AVL interval trees or pre-allocated trees.

jarnagin
Download Presentation

Calculating Stack Distances Efficiently

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Calculating Stack Distances Efficiently George Almasi,Calin Cascaval,David Padua {galmasi,cascaval,padua}@cs.uiuc.edu

  2. This talk is about: Algorithms to calculate stack distance histograms Speed/memory optimization of trace analysis to create stack distance histogram This talk is not about: why stack distance histograms are/are not useful relative merits of inter-reference distance vs. stack distance speed/memory optimization of applications What this talk is, and is not, about

  3. Two measures of locality • Inter-reference distance: • the number of other references between two references to the same address in the trace • Stack distance: • The number of distinct addresses referred between two references to the same address Inter-ref distance = 7 stack distance = 4 a b c d b c d e a

  4. C hits(C) =  s() =1 Inf misses(C) =  s() =C+1 Stack Distances As Cache Misses • compute the number of cache hits and misses as follows:

  5. Inter-reference distance • Given that at time t ref(t)=x • find t0, time of last previous reference to x • inter reference distance: • Efficient implementation: a (hash)table H(x) = t0, the trace index of the last reference to x; Memory usage ~ 2x original program Cost O(1) per reference

  6. Stack distance 1 3 Depth(x)  a h x x b b a h h x b c a a h d c b b a e d c c c e f d d d f e e ... e f f ... f x ... ... ... y x y z y y y u z z z z v u u u u v v v v

  7. Stack distance • Simulates an infinite cache with LRU replacement policy • nice properties (inclusion!) • naïve implementation: stack as linked list/array • m = 250,000 average maximum stack depth • list traversal/array updates; O(m) per trace element

  8. Insight: stack is contained in trace Time Trace a b b g g e d f z f c e b c d a g Time=t Stack g z f e b c d a g Stack top

  9. Holes • Index tx in the trace is a hole if ref(tx) has already been referenced again at a later time ty < t. • Using holes, we can say • stackdist(t) = refdist(t) - #holes(t0 to t) • How many holes are there between t0 and t?

  10. An interval tree of holes t t0 ... • • • o o o o a o o o a ref to a Prev. ref to a k:k k+4:k+5 k+2:k+3 • Single tree operation: count_and_add (t0) • Determines # of holes between t0 and t; adds a new hole at t0 • Adding a hole can create a new interval - or fuse two existing ones

  11. Operations on the interval tree Add to interval edge: count_and_add(p) p=n+1 Create new interval: count_and_add(p) p > n+1 Join two intervals: count_and_add(p) p = n+1 k:n k:n k:n n+2:p k:n+1 k:n+1 k:p p:p

  12. basics: tree is pre-allocated binary, balanced each node contains a number: the number of holes in its right subtree memory used by node depends on node’s depth a modified version of the B&K algorithm: holes instead of references binary instead of n-ary better memory usage Pre-allocated hole trees

  13. Pre-allocated hole trees a b b g e d f z f c e b c d a 1 0 1 0 1 0 0 0 1 1 0 0 3 0 1 n n n=n+1 count += n

  14. Q: Why holes and not stack elements? A: Holes need 1/2 the maintenance of stack elements. Q: Will the interval tree grow to ? A: No. Intervals fuse together spontaneously. Q: How big will the tree be? A: #of intervals = O(stack depth) Depth of a tree of stack elements would be the same size Q: Will the tree be unbalanced? A: Yes, because it tends to grow on one side. Many Questions

  15. Q: what kind of interval tree? A: RB and AVL Q: Which is better? A: AVL is better. Q: Why? A: shorter average tree height: h+1 vs. 2h not all operations change the tree structure More questions

  16. Interval trees: exec time O(log(m)) memory usage O(m) AVL better than RB pointer chasing, bad locality Pre-allocated trees: exec time O(log(n)) memory usage O(n) hits practical limit holes are better reduced maintenance no pointer chasing, good locality Comparisons

  17. Results: hole interval trees

  18. Results: preallocated trees

  19. Conclusions • Stack distances with holes: • using RB/AVL interval trees • using pre-allocated trees • Using holes reduces linear overhead by 20-40% for both kinds of algorithms.

More Related