In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Columnar tests using LHC physics analysis benchmark MaaikeLimper 2June 2014

Test setup • Single instance 32-core machine 512 GB memory • Using Beta version 2

Test bug-fixes in Beta 2 • Simultaneous population of multiple tables now OK in Beta 2 • Removing table from IMC with “ALTER TABLE … NO INMEMORY” now works OK • remaining issues reported in beta forum: re-population when changing in-memory properties

In-Memory Columnar table sizes • Test COMPRESS FOR QUERY vs CAPACITY HIGH • “EF”-> trigger-data, only booleans, best compression • “MET”-> table with floats & double, worst compression

v$inmemory_area • inmemory_size=120 GB • 64KB POOL: nearly empty (1/4 of im_size) • 1MB POOL: nearly full (3/4 of im_size) • Add option to use smaller 64 KB POOL for read-only data?

In-Memory Population • By default spawns 2*CPU-cores “space-background-workers” (=64 on my test-setup) • Large memory consumption, system starts using swap-space! • Consumes all CPU in system • I’ve manually set _max_spacebg_slaves=16 to prevent problems while populating

In-Memory Population • IMC population with 16 spacebg-slaves on 32-core machine: • Each slave takes 100% of 1 CPU-core • Total CPU-usage is ~50% of system COMPRESS FOR CAPACITY HIGH

In-Memory Population CPU-usage just as high when using default compression COMPRESS FOR QUERY (default)

In-Memory Population • 25 minutes to populate 94.6 GB table with 340 columns, 50 million rows (“electron”) 16 workers • Same time to populate with different compression rate COMPRESS FOR CAPACITY HIGH COMPRESS FOR QUERY (default)

Measuring query time • In the following slides I measure query time between reading data from the In-Memory Columnar store and data stored in the buffer cache • Speed-up factor depends on compression level used for IMC, here I show results for: • COMPRESS FOR QUERY (default) • COMPRESS FOR CAPACITY HIGH

Default compression: query time • IMC 40x faster than cache for (very) simple query:

Default compression: query time • IMC 15x faster than cache for simple query with group-by:

Default compression: query time • IMC 6x faster than cache for more complex query with window-function:

CAPACITY HIGH: query time • IMC 10x faster than cache for (very) simple query:

CAPACITY HIGH: query time • IMC 5x faster than cache for simple query with group-by:

CAPACITY HIGH: query time • IMC 2.5x faster than cache for more complex query with window-function:

Preliminary conclusion • Compression: • COMPRESS FOR QUERY has 2x less compression • COMPRESS FOR QUERY on average ~3x faster queries • Default number of workers 2xCPU-cores, too much (I think) • Can use a lot of CPU and memory, may result in swapping • No way to stop population once in progress, can hang the DB • I would recommend #workers=½ CPU-cores • Trying to get good “cache” vs “IMC” benchmark • Looks good for simple queries, • but I’d like to test more complex queries as well, in progress…

In-Memory Columnar tests using LHC physics analysis benchmark

In-Memory Columnar tests using LHC physics analysis benchmark

Presentation Transcript

Early LHC Physics

LHC Physics - Experiments

LHC Physics

SUSY Physics @ LHC

LHC physics

Workload-Aware Aggregate Maintenance in Columnar In-Memory Databases

LHC Physics Analysis and Databases

Simple columnar

Unit tests, Integration tests Physics tests

First attempt of physics analysis in LHC++

Physics @ LHC (Physics @ TeV)

Early LHC Physics

LHC Data Challenges and Physics Analysis

LHC Forward Physics

Symmetry Tests in Nuclear Physics

LHC First Physics

Physics at LHC

LHC Physics Center

LHC Forward Physics

Columnar Joints