20 likes | 180 Views
Dynamic Detection of Streams in Memory References †. Tushar Mohan 1 , Bronis R. de Supinski 2 , Sally A. McKee 1 , Frank Mueller 3 and Andy Yoo 2. 1 School of Computing University of Utah Salt Lake City, UT 84112. 2 Lawrence Livermore National Laboratory
E N D
Dynamic Detection of Streams in Memory References† Tushar Mohan1, Bronis R. de Supinski2, Sally A. McKee1, Frank Mueller3 and Andy Yoo2 1 School of Computing University of Utah Salt Lake City, UT 84112 2 Lawrence Livermore National Laboratory Center for Applied Scientific Computing Livermore, CA 94551 3 Department of Computer Science North Carolina State University 448 EGRC, Raleigh, NC 27695 †This work was performed under auspices of the U.S Department of Energy by the University of California, Lawrence Livermore National Laboratory, under contract UCRL-MI-144546 and the National Science Foundation award 0073532
1 4 6 • What is it? • A multithreaded tool to dynamically find streams in a reference trace of a running application. • A stream ? • Sequence of references in arithmetic progression. • e.g., 0, 4, 8, 12, 13, 16, 19, 20, 24 ; • represented as {start element = 0 |stride = 4 | length = 7} • In the code below, elements of A, B and C form three separate streams. • for (i=0; i<N; i++) • A[i] = B[i] + C[i] • 200, 204, 208 ....320, 324, 328 ....404, 408, 412 ... • {200, 4, N} {320, 4, N} {404, 4, N} • Complete issuing sequence: 320, 404, 200 ; 324, 408, 204 ; .... Setup Results Application Source Code Compile + Instrument Instrumentation routines Link Binary Thread 2 Thread 1 Control Pipe 2 dsd Stream Detection Process Data Pipe Thread 1 • The need? • Predicting performance benefits of using caches/smart stream • prefetching controllers (depend on number/lengths of streams) • Designing sophisticated, software assisted prefetching schemes • Characterizing memory access patterns on-the-fly • Evaluating different parallelization strategies Thread 2 5 • Stream Detection Process (dsd) • Each dsd thread consists of a Stream Table and a Pool • Stream Table: Contains detected streams compactly. Stored as a chained hash containing stream records. The stream successor is the hash key. • Pool: Contains recent references which have not yet been classified in any stream. Differences between pool elements are stored in the pool as well. • Consider the sequence: 4, 8, 13, 12, 16 3 • Features • Dynamic --- no compile-time detection • Multithreaded --- works on parallel programs • Low overhead • No storage space for traces needed • O(n) complexity, n is the trace length • Application slowdown factor: 50-200, two orders of • magnitude better than post-processing full traces • Only linear patterns detected • Flexible enough for real programs • Applied on umt98, bc • Interleaving streams and noise permitted: • 4, 8, 12, 16,1, 3,20, 24,5, 7, 2, 9, 11, 28 • Streams detected : {start=4, stride=4, length=7} , • {start=1, stride=2, length=6} • Noise: 2 • Implemented for SolarisTM, LinuxTM running on x86 Expected successor 20 References Stream Table (hash) Differences Pool • Future Work • Obtaining timing statistics for parallel programs on SMP machines • Exploring different parallelization strategies in kernels like DAXPY with the aid of dsd 7