130 likes | 481 Views
MonetDB : A column-oriented DBMS. Ryan Johnson CSC2531. The memory wall has arrived. CPU performance +70%/year Memory performance latency: -50%/ decade bandwidth: +20%/year (est.) Why? DRAM focus on capacity (+70%/year) Physical limitations (pin counts, etc.)
E N D
MonetDB: A column-oriented DBMS Ryan Johnson CSC2531
The memory wall has arrived • CPU performance +70%/year • Memory performance latency: -50%/decade bandwidth: +20%/year (est.) • Why? • DRAM focus on capacity (+70%/year) • Physical limitations (pin counts, etc.) • Assumption that caches "solve” latency problem DBMS spends 95% of time waiting for memory
The problem: data layouts • Logical layout: 2-D relation => Unrealizable in linear address space! • N-ary storage layout, aka “slotted pages” • Easy row updates, strided access to columns => Low cache locality for read-intensive workloads . . . “NSM layouts considered harmful”
Coping with The Wall • Innovation: decompose all data vertically • Columns stored separately, rejoined at runtime • Binary Association Table (BAT) replaces Relation • List of (recordID, columnValue) pairs • Compression and other tricks ==> 1 byte/entry BAT + clever algos=> cache locality => Winner!
Exploring deeper • Performance study (motivation) • Physical data layouts • Cache-optimized algorithms • Evaluating MonetDB performance • Implications and lingering questions
NSM: access latency over time Read one column(record size varies with x) Latency increases ~10x as accesses/cache line 1(slope changes at L1/L2 line size)
Efficient physical BAT layout • Idea #1: “virtual OID” • Optimizes common case • Dense, monotonic OIDs • All BATs sorted by OID • Idea #2: compression • Exploits small domains • Boosts cache locality, effective mem BW Joining two BAT on OID has O(n) cost! Out-of-band values? Can’t we compress NSM also? How to handle gaps?
Cache-friendly hash join • Hash partitioning: one pass but trashes L1/L2 • #clusters > #cache lines • Radix-partitioning: limit active #partitions by making more passes Recall:CPU is cheap compared to memory access
Great, but how well does it work? • Three metrics of interest • L1/L2 misses (= suffer latency of memory access) • TLB misses (even more expensive than cache miss) • Query throughput (higher is better) • Should be able to explain throughput using other metrics • Given model makes very good predictions => Memory really is (and remains!) the bottleneck
A few graphs Radix clustering behavior as cardinality varies Radix-clustered HJ vs. other algorithms Big win: stability as cardinalities vary
Implications and discussion points • Cache-friendly really matters (even w/ I/O) • Traditional DBMS memory-bound • Vertically decomposed data: superior density • Data brought to cache only if actually needed • Compression gives further density boost • Questions to consider... • Queries accessing many columns? • What about inserts/updates (touch many BAT)? • What about deletes/inserts(bad for compression)?
Implications and discussion points • Cache-friendly really matters (even w/ I/O) • Traditional DBMS memory-bound • Vertically decomposed data: superior density • Data brought to cache only if actually needed • Compression gives further density boost • Questions to consider... • Queries accessing many columns? • How to make a good query optimizer? • Performance of transactional workloads? • Update-intensive, concurrency control, ... • What about inserts (bad for compression)?