1 / 6

Inmar Visits Hadoop Summit North America 2014

Inmar Visits Hadoop Summit North America 2014. Triad Hadoop Users Group. 19 Jun 2014. Bryce Lohr. Scalability Yahoo: 455PB , 32,000+ nodes , 6,000+ node “large cluster” BrightRoll : 20B+ events/month, 100ms max response time on ad auctions

cheche
Download Presentation

Inmar Visits Hadoop Summit North America 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inmar Visits Hadoop Summit North America2014 Triad Hadoop Users Group 19 Jun 2014

  2. Bryce Lohr • Scalability • Yahoo: 455PB, 32,000+ nodes, 6,000+ node “large cluster” • BrightRoll: 20B+ events/month, 100ms max response time on ad auctions • RocketFuel: 50B events/day, 34k cores, 41PB, 8x growth in 1 year • OpenSOC: analyzes 1.2B packets/sec

  3. Bryce Lohr • Yahoo • YARN alone reduced resources required for same number of jobs by ~30% • BrightRoll • Flume, HDFS, HBase • Use moving markers in files as stream event source • Salted row keys in HBase increase parallelism • RocketFuel • MapReduce, HBase • Continuous model training; attributes stored in HBase as protobufs • Salted row keys increase write throughput

  4. Bryce Lohr • OpenSOC • Hadoop, Flume, Storm, Kafka, ElasticSearch • Kafka: num.io.threads = n disks, worker threads = producers • HBase: row key based on IP in Hex • Storm: good error handling prevents retry storms, watch CPU usage Know your data inside and out!

  5. Jeff Clouse • Advice for companies getting started with Hadoop • Have compelling business cases • Have a clear strategy and a long-term plan • Plugging Hadoop into the Enterprise • Needs to be part of the enterprise, not an add-on • Has implications for the entire enterprise architecture • Use Pig as the data analytics platform • Machine Learning • Many systems rely on Machine Learning to tune Hadoop’s outputs • More data makes simple algorithms more meaningful

  6. jeff.clouse@inmar.com Jeff Clouse

More Related