An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors

An Analysis of DatabaseWorkload Performance onSimultaneous MultithreadedProcessors Jack L. Lo, Luiz André Barroso, Susan Eggers Kourosh Gharachorloo, Henry Levy, Sujay Parekh

Motivation • DBMS and scientific workloads are different • DBMS workload is intrinsically multithreaded • DBMS is memory intensive, therefore low processor utilization • Potential poor memory performance introduced by SMT cache sharing

Objectives • Identify the memory-system behavior of database systems • Evaluate the negative effect of cache sharing introduced by SMT, and try to eliminate it • Evaluate SMT performance for DBMS workloads

Methodology • SMT model • Based on out-of-order, superscalar architecture • During each cycle, 8 instructions can be fetched from up to 2 of the 8 hardware contexts • FUs: 6 integer, 4 FP • 128K I + 128K D, 16MB L2 cache • Workloads • Oracle DBMS and Digital UNIX • On-line transaction processing (OLTP) • Decision support system (DSS)

Database Workload Characterization • 3 segments of memory that are accessed by dominating processes: • Instruction text • Program Global Area (PGA) • Shared Global Area (SGA) • SGA buffer cache • SGA other

Memory Behavior • High instruction miss rate for OLTP • Large memory footprint • High instruction/data reuse • Replacement is too frequent

Locality Profiles

Multi-Thread Cache Interference • Two types of interference • Destructive interference • One thread’s data replaces another thread’s data • Higher conflict misses • Constructive interference • Data loaded by one thread is used by another simultaneously-scheduled thread • Fewer misses

Identifying source of misses • PGA misses are the dominating factor • Caused by destructive interference

Page-mapping Policies • Affect L2 cache conflicts • Two policies • Page coloring • Spatial locality • Bin hopping • Temporal locality

Effect of Page-mapping policies

Application-Level Offsetting • Affect L1 cache conflicts • Offset the conflicting structures of different processes

SMT Performance on DBMS Workloads • SMT is highly effective in tolerating the high miss rates

Architecture Metrics

Conclusions • While database workloads have large footprints, there is substantial reuse that results in a small, cacheable “critical” working set • Additional data cache conflicts caused by SMT can be nearly eliminated • SMT’s latency tolerance is highly effective for database applications

An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors

An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors

Presentation Transcript

An Exercise in Improving SAS Performance on Mainframe Processors

Analysis of Database Workloads on Modern Processors

Analysis of Multithreaded Programs

Multithreaded Processors

GPU Computing: Pervasive Massively Multithreaded Processors

Bulldozer: An Approach to multithreaded Compute Performance

Understanding Performance Metrics of Processors

Multithreaded Processors

Workload Analysis

Workload Analysis

Statement Workload Analysis “Database MRI”

Multithreaded Processors

12. Multithreaded Processors

Multithreaded Processors

Multithreaded Processors

Soft Real-Time Scheduling on Simultaneous Multithreaded Processors

On-line Automated Performance Diagnosis on Thousands of Processors

Multithreaded Processors

High Performance Processors

Performance Analysis of Packet Classification Algorithms on Network Processors

Improving Database Performance on Simultaneous Multithreading Processors

Performance Analysis of Packet Classification Algorithms on Network Processors