60 likes | 359 Views
Inmar Visits Hadoop Summit North America 2014. Triad Hadoop Users Group. 19 Jun 2014. Bryce Lohr. Scalability Yahoo: 455PB , 32,000+ nodes , 6,000+ node “large cluster” BrightRoll : 20B+ events/month, 100ms max response time on ad auctions
E N D
Inmar Visits Hadoop Summit North America2014 Triad Hadoop Users Group 19 Jun 2014
Bryce Lohr • Scalability • Yahoo: 455PB, 32,000+ nodes, 6,000+ node “large cluster” • BrightRoll: 20B+ events/month, 100ms max response time on ad auctions • RocketFuel: 50B events/day, 34k cores, 41PB, 8x growth in 1 year • OpenSOC: analyzes 1.2B packets/sec
Bryce Lohr • Yahoo • YARN alone reduced resources required for same number of jobs by ~30% • BrightRoll • Flume, HDFS, HBase • Use moving markers in files as stream event source • Salted row keys in HBase increase parallelism • RocketFuel • MapReduce, HBase • Continuous model training; attributes stored in HBase as protobufs • Salted row keys increase write throughput
Bryce Lohr • OpenSOC • Hadoop, Flume, Storm, Kafka, ElasticSearch • Kafka: num.io.threads = n disks, worker threads = producers • HBase: row key based on IP in Hex • Storm: good error handling prevents retry storms, watch CPU usage Know your data inside and out!
Jeff Clouse • Advice for companies getting started with Hadoop • Have compelling business cases • Have a clear strategy and a long-term plan • Plugging Hadoop into the Enterprise • Needs to be part of the enterprise, not an add-on • Has implications for the entire enterprise architecture • Use Pig as the data analytics platform • Machine Learning • Many systems rely on Machine Learning to tune Hadoop’s outputs • More data makes simple algorithms more meaningful
jeff.clouse@inmar.com Jeff Clouse