1 / 10

Introduction to Apache Hadoop

Introduction to Apache Hadoop. CSCI 572: Information Retrieval and Search Engines Summer 2010. Outline. What is Hadoop? Where did it come from? What are the current versions of Hadoop? What can it do?. Apache Hadoop. The brainchild of Doug Cutting

lita
Download Presentation

Introduction to Apache Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Apache Hadoop CSCI 572: Information Retrieval and Search Engines Summer 2010

  2. Outline • What is Hadoop? • Where did it come from? • What are the current versions of Hadoop? • What can it do?

  3. Apache Hadoop • The brainchild of DougCutting • Built out by brilliant engineers and contributors from Yahoo, and Facebook and Cloudera and other companies • Started in 2007/2008 when code was spun out of Nutch • Has grown into really large project at Apache with significant ecosystem

  4. How to get started • Hadoop (0.20.0/0.20.2) • Put your Java hat on • Go here: • http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html • If you want to do this on Windows, get Cygwin, or VMWare or something that you can run Linux on • Run the Map Reduce examples on local mode • Check on the data generated in your HDFS • Scaling it out • Amazon Elastic Map Reduce • Setting it up on your own cluster: DataNodes and Task/JobTracker

  5. Basic Operations • Listing files • ./bin/hadoop fs –ls • Writing files • ./bin/hadoop fs –put • Running Map Reduce Jobs • mkdir input • cp conf/*.xml input • ./bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+’ • cat output/*

  6. Advanced Topics • Writing your Mappers and Reducers • Check out Map Reduce Tutorial here: • http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html • Code for several examples including Word Count

  7. Other Hadoop ecosystem projects • HBase • Big Table • HIVE • Built at FB, provides SQL interface on HDFS • Chukwa • Log Processing • Pig • Scientific data analysis language on top of M/R and HDFS • Zookeeper • Distributed Systems management

  8. No releases in a while • Stick with 0.20.x

  9. Wrapup • Lots more information at • http://hadoop.apache.org • http://hadoop.apache.org/mapreduce/ • http://hadoop.apache.org/hdfs/ • Project ideas • Implement GIS or geometrical algorithm in Map Reduce • Write REST interface to control HDFS and to M/R • Add new Writeable input data formats • Integrate Solr and Hadoop

  10. Acknowledgements • Material inspired by discussions and talks on the Apache Mailing lists for Hadoop and through discussions with the rest of the Hadoop community

More Related