70 likes | 210 Views
Data Latency. Rich Altmaier Software and Services Group. CPU Architecture Contribution. Data intensive == memory latency bound Minimal cache line use and reuse Often pointer chasing – hard to prefetch. CPU Architecture Contribution. Large Instruction cache
E N D
Data Latency Rich Altmaier Software and Services Group
CPU Architecture Contribution • Data intensive == memory latency bound • Minimal cache line use and reuse • Often pointer chasing – hard to prefetch
CPU Architecture Contribution • Large Instruction cache • Capture a sophisticated code loop, esp database • Share last level cache across cores • Nehalem added this for I & D • When lacking, a copy per core of I, and data lock lines have to move between caches • Integrated Memory Controller • Big win for latency in Nehalem • QPI for socket to socket cache line movement • Introduced in Nehalem, faster than FSB
CPU Architecture Contribution • Improvements in branch prediction • Successful prediction of more complex branching structures • Total number of outstanding cache line reads per socket • Improved in Nehalem • Exploited by Out of Order execution • Exploited by Hyper Threading (database benchmarks usually enable and win) • Opportunity to tune data structures for parallel reading
System Architecture Contribution • Larger physical memory • Faster memory (lower latency) • Faster I/O, and more ports, for data movement • SSDs – big boost to IOPS (I/Os per second) • Filesystem read/write is usually small and scattered • No big sequential ops • Faster networking
Summary • Large & shared cache • Latency reduction with Integrated Memory Controller, and QPI socket to socket • Total number of outstanding reads • Branch prediction • Storage configured for IOPS