130 likes | 239 Views
Hbase Operations. SwatI Agarwal , Thomas Pan eBay Inc. Overview. Pre-production cluster handling production data sets and work loads Data storage for listed item drives eBay Search Indexing Data storage for ranking data in the future
E N D
Hbase Operations SwatIAgarwal, Thomas Pan eBay Inc.
Overview • Pre-production cluster handling production data sets and work loads • Data storage for listed item drives eBay Search Indexing • Data storage for ranking data in the future • Leverage map reduce in the same cluster to build search index
Why Hbase? • Column oriented data store on top of HDFS • Availability • Tightly integrated with Hadoop Map/Reduce framework • No schema: storage can easily evolve and expand • Provides efficient scans and key based lookups • Supports versioning • Data consistency: Atomic row updates • Scalability • Good support from open-source community
HBASE CluSTER • 225 Data nodes • Region server • Task Tracker • Data Node • 14 Enterprise Nodes • Primary Name Node • Secondary Name Node • Job Tracker Node • 5 ZooKeeper Nodes • HBaseMaster • CLI Node • Ganglia Reporting Nodes • Spare Nodes for Failover • Node Hardware • 12 2TB hard-drives • 72GB RAM • 24 cores under hyper-threading
Hadoop/HBase Configuration • Region Server • HBaseRegion Server JVM Heap Size: -Xmx15GB • Number of HBase Region Server Handlers: hbase.regionserver.handler.count=50 (Matching number of active regions) • Region Size: hbase.hregion.max.filesize=53687091200 (50GB to avoid automatic split) • Turn off auto major compaction: hbase.hregion.majorcompaction=0 • Map Reduce • Number of Data Node Threads: dfs.datanode.handler.count=100 • Number of Name Node Threads: dfs.namenode.handler.count=1024 (Todd: • Name Node Heap Size: -Xmx30GB • Turn Off Map Speculative Execution: mapred.map.tasks.speculative.execution=false • Turn off Reduce Speculative Execution: mapred.reduce.tasks.speculative.execution=false • Client settings • HBaseRPC Timeout: hbase.rpc.timeout=600000 (10 minutes for client side timeout) • HBaseClient Pause: hbase.client.pause=3000 • HDFS • Block Size: dfs.block.size=134217728 (128MB) • Data node xciever count: dfs.datanode.max.xcievers=131072 • Number of mappers per node: mapred.tasktracker.map.tasks.maximum=8 • Number of reducers per node: mapred.tasktracker.reduce.tasks.maximum=6 • Swap turned off
HBase Tables • Multiple tables in a single cluster storing inventory data (item, seller) • Multiple column families per table: <= 3 • Number of columns: < 200. • 1.45 billion rows total • Max row size: ~20KB • Average row size: ~10KB • 13.01TB data • Bulk load speed: ~500 Million items in 30 minutes • Random write updates: 25K records per minute • Scan speed: 2004 rows per second per region server (average version 3), 465 rows per second per region server (average version 10) • Scan speed with filters: 325~353 rows per second per region server
Hbase Tables (cont.) • Pre-split 3600 Regions per table • Table is split into roughly equal sized regions. • Important to pick well distributed keys • Currently using bit reversal • Region split has been disabled by setting very large region size. • Major compaction on demand • Purge rows periodically • Balance regions among region servers on demand
RowKey Scheme and Sharding • RowKey • 64-bit unsigned integer • Bit reversal of document id • Document ID: 2 • RowKey: 0x4000000000000000 • HBase creates regions with even RowKey range • Each map task maps to each region.
MoNITORING Systems • Ganglia • Nagios Alerts • Table consistency – hbck • Table balancing – in-house tool • Region size • CPU usage • Memory usage • Disk failures • HDFS block count • …… • In-house Job Monitoring System • Based on OpenTSDB • Job Counters
CHALLENGES/ISSUES • HBase stability • HDFS issues can impact Hbase, such as name node failure • Map/Reduce jobs can impact HBase region servers, such as high memory usage • Region stuck in migration • HBase health monitoring • HBase table maintenance • HBase table regions become unbalanced • Major compaction after row purge and updates • Software Upgrades cause big downtime • Normal hardware failures may cause issues • Stuck regions due to failed hard disk • Region servers were deadlocked due to jvm • Testing
Future Direction • High scalability • Scale out a table with more regions • Scale out the whole cluster with more data • High availability • No downtime for upgrades • Adopt co-processor • Near-Real-Time Indexing
Community Acknowledgement • KannanMuthukkaruppan • KarthikRanganathan • Lars George • Michael Stack • Ted Yu • Todd Lipcon • Konstantin Shvachko