240 likes | 251 Views
Learn about the advantages of using Apache Hadoop on the open cloud for configuring, managing, and running Map Reduce jobs on the Swift filesystem. Discover use cases, benefits, and challenges of leveraging Hadoop in the cloud with examples on Hortonworks Data Platform.
E N D
Apache Hadoopon theOpen Cloud David Dobbins Nirmal Ranganathan
Who is using Apache Hadoop • Traditionally = Developers • Increasingly = Business Users / Data Scientists • Why does this matter?
Advantages of using the cloud Flexible Fast Easy
Demo Run an actual job
Swift Filesystem for Hadoop: HADOOP-8545 The challenges of running Map Reduce jobs against Swift.. • Identity management • Block size • Object store vs file paths • Direct API into swift from HDFS • New filesystem URL, swift:// • Read from, write to local & remote Swift clusters • Keep long-lived data in Swift; upload while Hadoop cluster off-line
Map Reduce to Swift (via “HDFS”) Application X Application X MapReduce MapReduce HDFS HDFS Proxy SWIFT
Cloud Big Data Platform • Hortonworks Data Platform • HDP 1.1 • HDP 1.3 • Pig, Hive, HCatalog • Coming soon HDP 2.0
Cloud Big Data Platform • Secure by default • Comes pre-optimized • Web UI, CLI, REST API
Why an Open Platform matters Sandbox on Rackspace Cloud RAX Resell Sandbox VM
@caffiend @rnirmal http://www.rackspace.com/big-data