220 likes | 406 Views
STEPS Towards Cache-Resident Transaction Processing. Yifei Tao Kitsuregawa Lab. Outline . Background Steps : Introduction cache-resident code Applying Steps to OLTP workloads Summary. Background.
E N D
STEPS Towards Cache-Resident Transaction Processing Yifei Tao Kitsuregawa Lab
Outline • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary
Background • OLTP(OnLine Transaction Processing) is one of core technologies in RDBMS, especially in Business scene. • OLTPcode is concrete and complex and is not changed to gain advanced hardware technology, the new methodology is required improve performance by application code sharing on CPU cache without code changed
Background Research [AD+99][LB+98][SBG02] shows OLTP are predominantly delayed by instruction cache misses, especially L1-I misses.
Background To maximize L1-I cache utilization and minimize stalls: 1. application code should have few branches 2. most importantly, the “working set” code footprint should fit in the L1-I cache Unfortunately, OLTP workloads exhibit the exact opposite behavior
Contents • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary
What is STEPS? • Synchronized Transaction through Explicit Processor Scheduling • multiplexing concurrent transactions and exploiting common code paths. • One transaction paves the cache with instructions, while close followers enjoy a nearly miss-free execution.
Fast, efficient context-switching Typical context-switching mechanisms occupy a significant portion of the L1-I cache and take hundreds of processor cycles to run Steps execute only the core contexts-switch code and updates only CPU state, ignoring thread-specific software structures such as the ready queue, until they must be updated.
Instruction misses Execute a operation P: α:decreasing ratio of cache miss in warm cache0 < α <= 1 β: sharing ratio0 <β < 1
Gain analysis of Steps When comparing Steps to Shore(1 - #misses after/ #misses before)・100%, the bounds for computing the L1-I cache miss reduction are : For index fetch, α= 0.373, β=0.033, giving a range of 82% - 87% of overall reduction in L1–I misses for 10 threads, and 90% - 96% for 100 threads.
Contents • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary
TPC-C • OLTP benchmark • Business application including New Order, Payment, Delivery, Stock Managements • Experiment 1: Payment • Experiment 2: New Order, Payment, Stock Managements
Outline • Background • Steps: Introduction cache-resident code • Applying Steps to OLTP workloads • Summary
Summary • Introduction to the Steps • Steps claims OLTP workload can be reduced by considering instruction cache misses. • The simulation with TPC-C shows L1-I cache misses and branch mispredicts decrease.