240 likes | 411 Views
Apache Hadoop on the Open Cloud. David Dobbins Nirmal Ranganathan. Who is using Apache Hadoop. Traditionally = Developers Increasingly = Business Users / Data Scientists Why does this matter?. Configuring and managing a Hadoop cluster is hard. Resources / Expertise.
E N D
Apache Hadoopon theOpen Cloud David Dobbins Nirmal Ranganathan
Who is using Apache Hadoop • Traditionally = Developers • Increasingly = Business Users / Data Scientists • Why does this matter?
Advantages of using the cloud Flexible Fast Easy
Demo Run an actual job
Swift Filesystem for Hadoop: HADOOP-8545 The challenges of running Map Reduce jobs against Swift.. • Identity management • Block size • Object store vs file paths • Direct API into swift from HDFS • New filesystem URL, swift:// • Read from, write to local & remote Swift clusters • Keep long-lived data in Swift; upload while Hadoop cluster off-line
Map Reduce to Swift (via “HDFS”) Application X Application X MapReduce MapReduce HDFS HDFS Proxy SWIFT
Cloud Big Data Platform • Hortonworks Data Platform • HDP 1.1 • HDP 1.3 • Pig, Hive, HCatalog • Coming soon HDP 2.0
Cloud Big Data Platform • Secure by default • Comes pre-optimized • Web UI, CLI, REST API
Why an Open Platform matters Sandbox on Rackspace Cloud RAX Resell Sandbox VM
@caffiend @rnirmal http://www.rackspace.com/big-data