90 likes | 301 Views
Experience with HiBench From Micro-Benchmarks toward End-to-End Pipelines WBDB 2013 Workshop Presentation. Lan Yi lan.yi@intel.com Senior Software Engineer Intel China Software Center 2013.07.16. HiBench. Micro Benchmarks. Web Search. Different from GrixMix, SWIM? Micro Benchmark?
E N D
Experience with HiBenchFrom Micro-Benchmarks toward End-to-End PipelinesWBDB 2013 Workshop Presentation Lan Yi lan.yi@intel.com Senior Software Engineer Intel China Software Center 2013.07.16
HiBench Micro Benchmarks Web Search Different from GrixMix, SWIM? Micro Benchmark? Isolated components? End-2-end Benchmark? We need ETL-Recommendation Pipeline • Nutch Indexing • Page Rank • Sort • WordCount • TeraSort HiBench Machine Learning HDFS • Bayesian Classification • K-Means Clustering • Enhanced DFSIO See our paper “The HiBench Suite: Characterization of the MapReduce-Based Data Analysis” in ICDE’10 workshops (WISS’10)
ETL-Recommendation (hammer) TPC-DS Sales updates h1 h2 h24 Cookies updates ETL ETL-sales ETL-logs CF Test WP log table Sales tables Item-item similarity matrix Statistics & Measurements ip agent Retcode cookies Pref Pref-sales Pref-logs Offline test Sales preferences Browsing preferences Pref-comb Mahout Test data User-item preferences Item based Collaborative Filtering HIVE-Hadoop Cluster (Data Warehouse)
ETL-Recommendation (hammer) • Task Dependences ETL-sales ETL-logs Pref-sales Pref-logs Offline test Pref-comb Item based Collaborative Filtering
Empirical Data (hammer) Intel Xeon E5-2600 @ 2.2Ghz, sandyBridge 2 x 8x HT = 32 cores 192G Mem, WD 7200 0.3x12x4=14.4T 1000M net, 300M~400M/s 4-node cluster , RHL6.2, cdh4.1.2 HiBench etl-recomm branch, HiTune-0.9 Sales ~14G (TPC-DS scale 100), logs ~105G
LinkBench • Benchmark for Social Graph Service • Originally Developed by Facebook on Top of MySQL • Simulate social graph workloads similar to Facebook’s online service • Key workload properties match Facebook’s real production workload • Different from Analytical Workloads • Our Work • Port LinkBench to HBase • On top of Phoenix (SQL support over HBase)
Resources • HiBench • https://github.com/intel-hadoop/HiBench • HiBench ETL-Recomm Branch • https://github.com/intel-hadoop/HiBench/tree/etl-recomm • LinkBench • https://github.com/intel-hadoop/linkbench • HiTune • https://github.com/intel-hadoop/HiTune • Phoenix • https://github.com/intel-hadoop/phoenix