220 likes | 238 Views
Memory Hierarchy Adaptivity An Architectural Perspective. Alex Veidenbaum AMRM Project sponsored by DARPA/ITO. Opportunities for Adaptivity. Cache organization Cache performance “assist” mechanisms Hierarchy organization Memory organization (DRAM, etc) Data layout and address mapping
E N D
Memory Hierarchy AdaptivityAn Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO
Opportunities for Adaptivity • Cache organization • Cache performance “assist” mechanisms • Hierarchy organization • Memory organization (DRAM, etc) • Data layout and address mapping • Virtual Memory • Compiler assist
Opportunities - Cont’d • Cache organization: adapt what? • Size: NO • Associativity: NO • Line size: MAYBE, • Write policy: YES (fetch,allocate,w-back/thru) • Mapping function: MAYBE
Opportunities - Cont’d • Cache “Assist”: prefetch, write buffer, victim cache, etc. between different levels. • Adapt what? • Which mechanism(s) to use • Mechanism “parameters”
Opportunities - Cont’d • Hierarchy Organization: • Where are cache assist mechanisms applied? • Between L1 and L2 • Between L1 and Memory • Between L2 and Memory • What are the data-paths like? • Is prefetch, victim cache, write buffer data written into the cache? • How much parallelism is possible in the hierarchy?
Opportunities - Cont’d • Memory Organization • Cached DRAM? • Interleave change? • PIM
Opportunities - Cont’d • Data layout and address mapping • In theory, something can be done but… • MP case is even worse • Adaptive address mapping or hashing based on ???
Opportunities - Cont’d • Compiler assist • Can select initial configuration • Pass hints on to hardware • Generate code to collect run-time info and adjust execution • Adapt configuration after being “called” at certain intervals during execution • Select/run-time optimize code
Opportunities - Cont’d • Virtual Memory can adapt • Page size? • Mapping? • Page prefetching/read ahead • Write buffer (file cache) • The above under multiprogramming?
Applying Adaptivity • What Drives Adaptivity? Performance impact, overall and/or relative • “Effectiveness”, e.g. miss rate • Processor Stall introduced • Program characteristics • When to perform adaptive action • Run time: use feedback from hardware • Compile time: insert code, set up hardware
Where to Implement • In Software: compiler and/or OS • (Static) Knowledge of program behavior • Factored into optimization and scheduling • Extra code, overhead • Lack of dynamic run-time information • Rate of adaptivity • requires recompilation, OS changes
Where to Implement - Cont’d • Hardware • dynamic information available • fast decision mechanism possible • transparent to software (thus safe) • delay, clock rate limit algorithm complexity • difficult to maintain long-term trends • little knowledge of about program behavior
Where to Implement - Cont’d • Hardware/software • Software can set coarse hardware parameters • Hardware can supply software dynamic info • Perhaps more complex algorithms can be used • Software modification required • Communication mechanism required
Current Investigation • L1 cache assist • See wide variability in assist mechanisms effectiveness between • Individual Programs • Within a program as a function of time • Propose hardware mechanisms to select between assist types and allocate buffer space • Give compiler an opportunity to set parameters
Mechanisms Used • Prefetching • Stream Buffers • Stride-directed, based on address alone • Miss Stride: prefetch the same address using the number of intervening misses • Victim Cache • Write Buffer, all after L1
Mechanisms Used - Cont’d • A mechanism can be used by itself or • All are used at once • Buffer space size and organization fixed • No adaptivity involved
Observed Behavior • Programs exhibit different effect from each mechanism, e.g none a consistent winner • Within a program the same holds in the time domain between mechanisms.
Observed Behavior - Cont’d • Both of the above facts indicate a likely improvement from adaptivity • Select a better one among mechanisms • Even more can be expected from adaptively re-allocating from the combined buffer pool • To reduce stall time • To reduce the number of misses
Proposed Adaptive Mechanism • Hardware: • a common pool of 2-4 word buffers • a set of possible policies, a subset of: • Stride-directed prefetch • PC-based prefetch • History-based prefetch • Victim cache • Write buffer
Adaptive Hardware - Cont’d • Performance monitors for each type/buffer • misses, stall time on hit, thresholds • Dynamic buffer allocator among mechanisms • Allocation and monitoring policy: • Predict future behavior from observed past • Observe over a time interval dT, set for next • Save perform. trends in next-level tags (<8bits)
Further opportunities to adapt • L2 cache organization • variable-size line • L2 non-sequential prefetch • In-memory assists (DRAM)
MP Opportunities • Even longer latency • Coherence, hardware or software • Synchronization • Prefetch under and beyond the above • Avoid coherence if possible • Prefetch past synchronization • Assist Adaptive Scheduling