250 likes | 460 Views
最专业 的移动应用统计分析和开发者服务平台. 王春国 email: wangchunguo@umeng.com wechat / qq : 715356603. agenda. Mobile Big Data Tech stack Real time Dataflow Hadoop architect Data Warehouse Sloutions Q&A . Mobile Big Data. Mobile Data Features. Diversity Fragmentation M ulti - dimensional
E N D
最专业的移动应用统计分析和开发者服务平台 王春国 email: wangchunguo@umeng.com wechat/qq : 715356603
agenda • Mobile Big Data • Tech stack • Real time Dataflow • Hadoop architect • Data Warehouse • Sloutions • Q&A
Mobile Data Features • Diversity • Fragmentation • Multi-dimensional • Frequently • High-speed growth • Low quality
10+ billion installation • ~3+ billion request、max 60000/s • ~5TB + day • ~1000 nodes • 2 – 2.5 billion message • 500+ job • 16 thousands + App • 65 thousands+ developer
Java、Scala、Python、Shell、C … • Kfaka、Storm • Hive 、Pig • Mapreduce • Redis、MongoDB、HBase • Excel、R • Finagle • Git
Protobuf • Serializing structured data – think XML • Flexible ,Efficient , Simple • Development language independence • More smaller • More faster • Format Simpler • Less ambiguous
Hive ORCFile Features • Reduces the NameNode'sload • light-weight indexes -skip row groups -seek to a given row • block-mode compression • bound the amount of memory needed for reading or writing • metadata stored using Protocol Buffers
LZMA Compress • More faster compression speed • More faster decompression speed • More Smaller memory requirements decompression • More Smaller code size for decompression
Blend Scheduler • Fair Scheduler • Map Slot <-> Reduce Slot • More efficient • Full use of cluster resources
Data Skew Row Key design by date+appkey Row Key design by md5(date+app_key)[0:4] +date+appkey
Bulk Load MapReduce -> put HBase Table HDFS -> HFile -> Table 4min 10s