Memory Hierarchy Adaptivity An Architectural Perspective

Memory Hierarchy AdaptivityAn Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO

Opportunities for Adaptivity • Cache organization • Cache performance “assist” mechanisms • Hierarchy organization • Memory organization (DRAM, etc) • Data layout and address mapping • Virtual Memory • Compiler assist

Opportunities - Cont’d • Cache organization: adapt what? • Size: NO • Associativity: NO • Line size: MAYBE, • Write policy: YES (fetch,allocate,w-back/thru) • Mapping function: MAYBE

Opportunities - Cont’d • Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels. • Adapt what? • Which mechanism(s) to use • Mechanism “parameters”

Opportunities - Cont’d • Hierarchy Organization: • Where are cache assist mechanisms applied? • Between L1 and L2 • Between L1 and Memory • Between L2 and Memory • What are the data-paths like? • Is prefetch, victim cache, write buffer data written into the cache? • How much parallelism is possible in the hierarchy?

Opportunities - Cont’d • Memory Organization • Cached DRAM? • Interleave change? • PIM

Opportunities - Cont’d • Data layout and address mapping • In theory, something can be done but… • MP case is even worse • Adaptive address mapping or hashing based on ???

Opportunities - Cont’d • Compiler assist • Can select initial configuration • Pass hints on to hardware • Generate code to collect run-time info and adjust execution • Adapt configuration after being “called” at certain intervals during execution • Select/run-time optimize code

Opportunities - Cont’d • Virtual Memory can adapt • Page size? • Mapping? • Page prefetching/read ahead • Write buffer (file cache) • The above under multiprogramming?

Applying Adaptivity • What Drives Adaptivity? Performance impact, overall and/or relative • “Effectiveness”, e.g. miss rate • Processor Stall introduced • Program characteristics • When to perform adaptive action • Run time: use feedback from hardware • Compile time: insert code, set up hardware

Where to Implement • In Software: compiler and/or OS • (Static) Knowledge of program behavior • Factored into optimization and scheduling • Extra code, overhead • Lack of dynamic run-time information • Rate of adaptivity • requires recompilation, OS changes

Where to Implement - Cont’d • Hardware • dynamic information available • fast decision mechanism possible • transparent to software (thus safe) • delay, clock rate limit algorithm complexity • difficult to maintain long-term trends • little knowledge of about program behavior

Where to Implement - Cont’d • Hardware/software • Software can set coarse hardware parameters • Hardware can supply software dynamic info • Perhaps more complex algorithms can be used • Software modification required • Communication mechanism required

Current Investigation • L1 cache assist • See wide variability in assist mechanisms effectiveness between • Individual Programs • Within a program as a function of time • Propose hardware mechanisms to select between assist types and allocate buffer space • Give compiler an opportunity to set parameters

Mechanisms Used • Prefetching • Stream Buffers • Stride-directed, based on address alone • Miss Stride: prefetch the same address using the number of intervening misses • Victim Cache • Write Buffer, all after L1

Mechanisms Used - Cont’d • A mechanism can be used by itself or • All are used at once • Buffer space size and organization fixed • No adaptivity involved

Observed Behavior • Programs exhibit different effect from each mechanism, e.g none a consistent winner • Within a program the same holds in the time domain between mechanisms.

Observed Behavior - Cont’d • Both of the above facts indicate a likely improvement from adaptivity • Select a better one among mechanisms • Even more can be expected from adaptively re-allocating from the combined buffer pool • To reduce stall time • To reduce the number of misses

Proposed Adaptive Mechanism • Hardware: • a common pool of 2-4 word buffers • a set of possible policies, a subset of: • Stride-directed prefetch • PC-based prefetch • History-based prefetch • Victim cache • Write buffer

Adaptive Hardware - Cont’d • Performance monitors for each type/buffer • misses, stall time on hit, thresholds • Dynamic buffer allocator among mechanisms • Allocation and monitoring policy: • Predict future behavior from observed past • Observe over a time interval dT, set for next • Save perform. trends in next-level tags (<8bits)

Further opportunities to adapt • L2 cache organization • variable-size line • L2 non-sequential prefetch • In-memory assists (DRAM)

MP Opportunities • Even longer latency • Coherence, hardware or software • Synchronization • Prefetch under and beyond the above • Avoid coherence if possible • Prefetch past synchronization • Assist Adaptive Scheduling

Memory Hierarchy Adaptivity An Architectural Perspective

Memory Hierarchy Adaptivity An Architectural Perspective

Presentation Transcript

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy

Memory Hierarchy