250 likes | 448 Views
Understanding & Tuning Compaction Algorithms Nicolas Spiegelberg Software Engineer, Facebook. HBase Users Group, January 23, 2013. Agenda. Compactions: Background. Log Structured Merge Tree. Server. . . . . . Shard #2. Shard # 1. . . . . . ColumnFamily #2. ColumnFamily #1. Memstore.
E N D
Understanding & Tuning Compaction AlgorithmsNicolas SpiegelbergSoftware Engineer, Facebook • HBase Users Group, • January 23, 2013
Log Structured Merge Tree Server . . . . Shard #2 Shard #1 . . . . ColumnFamily#2 ColumnFamily #1 Memstore HFiles flush • Data in HFile is sorted; has block index for efficient retrieval
About LSMT Write Algorithms are relatively-trivial • Write new, immutable file • Avoid stalls Read Algorithms are varied • Compaction • Server-side Filters • Block Index • Bloom Filter
Compactions: Intro Critical for Read Performance • Merge N files • Reduces read IO when earlier filters don’t help enough • The most complicated part of an LSMT • What & when to select HFiles Merge
Compactions: Disclaimers Assumptions • Only general algorithms included • Coprocessors available for some common apps • Assume a relatively-stable R+W workload
Sigma Compaction Default algorithm in HBase 0.90 #1. File selection based on summation of sizes. size[i] < (size[0] + size[1] + …size[i-1]) * C #2. Compact only if at least N eligible files found. + trivial implementation- non-deterministic latency + minimal overwrites- files have variable lifetime - no incremental benefit
Compactions: Configuration All Compaction Algorithms • hbase.hstore.compaction.ratio • hbase.hstore.compaction.min • hbase.hregion.majorcompaction • hbase.offpeak.start.hour • hbase.offpeak.end.hour • hbase.hstore.compaction.ratio.offpeak
Tiered Compaction Default algorithm in BigTable/HBase #1. File selection based on size relative to a pivot: size[i] * C >= size[p] <= size[k] / C :: i< p < k #2. Compact only if at least N eligible files found. (groups files into “tiers”) + trivial implementation- more files seeks necessary + more deterministic behavior- still write-biased + mediumsize files are warm- no incremental benefit
Compactions: Configuration Tiered Compaction • Enable: “hbase.hstore.compaction.CompactionPolicy” • Default.NumCompactionTiers • Default.Tier.X • MaxSize • MaxAgeInDisk
Compactions: Work Queues • Problem: Starvation • Solution: • Handle Large & Small Compactions Differently • Allow a configurable “throttle” to determine which queue
Compactions: Configuration Compaction Work Queues • hbase.regionserver.thread.compaction.small • hbase.regionserver.thread.compaction.large • hbase.regionserver.thread.compaction.throttle/ “ThrottlePoint”
Leveled Compaction Default algorithm in LevelDB #1. Bucket into tiers of magnitude difference (~10x) #2. Shard the compaction across files (not just block index) #3. Only the shard that goes over a certain size + optimized for read-heavy use- complicated algorithm + faster compaction turnaround- heavy rewrites on write-dominated use + easy to cache-on-compact- time range filters less effective
Time-Series Compaction • Log-structured Merge Tree • Time-ordered Data Storage! HFiles flush • Time-Series Compaction • Implement with Coprocessor • Time-boundary Based • Shard HFiles on Hour, Day, etc… day… hour… … • Time-series data optimized • Write-biased query optimized HFiles
Compactions: Associated JIRAs • 0.90 Sigma Compactions (HBASE-3209) • 0.92 Multi-Threaded Compactions (HBASE-1476) • 0.96 Tier-based Compaction (HBASE-6371 & 7055) • Future Make Compactions Pluggable (HBASE-7516) Leveled Compaction (HBASE-7519)
Compactions: High Level Thoughts Variables • Disk IO on HFile Read • Disk & Network IO on Compaction (R+W)
Compactions: High Level Thoughts Related Questions • Is data mutate or append? • Mutates benefit from lazy seeks but cause disk bloat • HFile reduction is less useful as Rows queries are larger • Are you missing critical filters? • Explicit vs. Implicit Requests • Cache on write/compact (CacheConfig) • Time Range / Column Filter • Bloom Filters: non-trivial decision, need to measure