170 likes | 322 Views
In-Memory Columnar tests using LHC physics analysis benchmark. Maaike Limper 2 June 2014. Test setup. Single instance 32-core machine 512 GB memory Using Beta version 2. Test bug-fixes in Beta 2. Simultaneous population of multiple tables now OK in Beta 2
E N D
In-Memory Columnar tests using LHC physics analysis benchmark MaaikeLimper 2June 2014
Test setup • Single instance 32-core machine 512 GB memory • Using Beta version 2
Test bug-fixes in Beta 2 • Simultaneous population of multiple tables now OK in Beta 2 • Removing table from IMC with “ALTER TABLE … NO INMEMORY” now works OK • remaining issues reported in beta forum: re-population when changing in-memory properties
In-Memory Columnar table sizes • Test COMPRESS FOR QUERY vs CAPACITY HIGH • “EF”-> trigger-data, only booleans, best compression • “MET”-> table with floats & double, worst compression
v$inmemory_area • inmemory_size=120 GB • 64KB POOL: nearly empty (1/4 of im_size) • 1MB POOL: nearly full (3/4 of im_size) • Add option to use smaller 64 KB POOL for read-only data?
In-Memory Population • By default spawns 2*CPU-cores “space-background-workers” (=64 on my test-setup) • Large memory consumption, system starts using swap-space! • Consumes all CPU in system • I’ve manually set _max_spacebg_slaves=16 to prevent problems while populating
In-Memory Population • IMC population with 16 spacebg-slaves on 32-core machine: • Each slave takes 100% of 1 CPU-core • Total CPU-usage is ~50% of system COMPRESS FOR CAPACITY HIGH
In-Memory Population CPU-usage just as high when using default compression COMPRESS FOR QUERY (default)
In-Memory Population • 25 minutes to populate 94.6 GB table with 340 columns, 50 million rows (“electron”) 16 workers • Same time to populate with different compression rate COMPRESS FOR CAPACITY HIGH COMPRESS FOR QUERY (default)
Measuring query time • In the following slides I measure query time between reading data from the In-Memory Columnar store and data stored in the buffer cache • Speed-up factor depends on compression level used for IMC, here I show results for: • COMPRESS FOR QUERY (default) • COMPRESS FOR CAPACITY HIGH
Default compression: query time • IMC 40x faster than cache for (very) simple query:
Default compression: query time • IMC 15x faster than cache for simple query with group-by:
Default compression: query time • IMC 6x faster than cache for more complex query with window-function:
CAPACITY HIGH: query time • IMC 10x faster than cache for (very) simple query:
CAPACITY HIGH: query time • IMC 5x faster than cache for simple query with group-by:
CAPACITY HIGH: query time • IMC 2.5x faster than cache for more complex query with window-function:
Preliminary conclusion • Compression: • COMPRESS FOR QUERY has 2x less compression • COMPRESS FOR QUERY on average ~3x faster queries • Default number of workers 2xCPU-cores, too much (I think) • Can use a lot of CPU and memory, may result in swapping • No way to stop population once in progress, can hang the DB • I would recommend #workers=½ CPU-cores • Trying to get good “cache” vs “IMC” benchmark • Looks good for simple queries, • but I’d like to test more complex queries as well, in progress…