STEPS Towards Cache-Resident Transaction Processing

STEPS Towards Cache-Resident Transaction Processing Yifei Tao Kitsuregawa Lab

Outline • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary

Background • OLTP(OnLine Transaction Processing) is one of core technologies in RDBMS, especially in Business scene. • OLTPcode is concrete and complex and is not changed to gain advanced hardware technology, the new methodology is required improve performance by application code sharing on CPU cache without code changed

Background Research [AD+99][LB+98][SBG02] shows OLTP are predominantly delayed by instruction cache misses, especially L1-I misses.

Background To maximize L1-I cache utilization and minimize stalls: 1. application code should have few branches 2. most importantly, the “working set” code footprint should fit in the L1-I cache Unfortunately, OLTP workloads exhibit the exact opposite behavior

Contents • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary

What is STEPS? • Synchronized Transaction through Explicit Processor Scheduling • multiplexing concurrent transactions and exploiting common code paths. • One transaction paves the cache with instructions, while close followers enjoy a nearly miss-free execution.

Basic Idea of Steps

Fast, efficient context-switching Typical context-switching mechanisms occupy a significant portion of the L1-I cache and take hundreds of processor cycles to run Steps execute only the core contexts-switch code and updates only CPU state, ignoring thread-specific software structures such as the ready queue, until they must be updated.

Steps in practice

Instruction misses and thread group size

Instruction misses Execute a operation P: α：decreasing ratio of cache miss in warm cache0 < α <= 1 β: sharing ratio0 <β < 1

Gain analysis of Steps When comparing Steps to Shore(1 - #misses after/ #misses before)・100%, the bounds for computing the L1-I cache miss reduction are : Ｆｏｒ index fetch, α= 0.373, β=0.033, giving a range of 82% - 87% of overall reduction in L1–I misses for 10 threads, and 90% - 96% for 100 threads.

Detailed behavior on two different processors

Contents • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary

Experimentation setup

TPC-C • OLTP benchmark • Business application including New Order, Payment, Delivery, Stock Managements • Experiment 1: Payment • Experiment 2: New Order, Payment, Stock Managements

TPC-C results: Experiment 1

TPC-C results: Experiment 2

Outline • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary

Summary • Introduction to the Steps • Steps claims OLTP workload can be reduced by considering instruction cache misses. • The simulation with TPC-C shows L1-I cache misses and branch mispredicts decrease.

Thank you!

STEPS Towards Cache-Resident Transaction Processing

STEPS Towards Cache-Resident Transaction Processing

Presentation Transcript

Transaction Processing Discussion

Transaction Processing

Transaction Processing Concepts

Transaction Processing:

Distributed Transaction Processing

Transaction Processing

Transaction Processing

Transaction Processing

TRANSACTION PROCESSING TECHNIQUES

Transaction Processing Systems

Transaction Processing

Transaction Processing

Transaction Processing

Transaction Processing Concepts

Transaction processing concepts

Transaction Processing Systems