100 likes | 271 Views
A short presentation to introduce Apache Hadoop, what is it and what can it do ? What are the other products associated with it ?
E N D
Apache Hadoop • What is it ? • Architecture • Related Projects • Large users
Hadoop – What is it ? • An open source system developed using Java • Supports very large data sets • Supports large clusters of servers • Designed to run on pre existing low cost hardware • Allows for fragmentation of work over cluster • Allows for fragmentation of storage over cluster • Provides resiliance via automatic failure handling
Hadoop - Architecture Hadoop consists of • Hadoop Common Common utilities for Hadoop module support • Hadoop MapReduce Parallel processing of Hadoop data • Hadoop Yarn Scheduler and resource manager • Hadoop Distributed File System (HDFS) A Master/Slave file system which spreads the Hadoop data over a very large cluster of slave data nodes controlled by a single name node.
Hadoop – Related Projects • Pig - for analysing large data sets • Hive – data warehouse system for Hadoop • Mahout – machine learning and data mining • Avro – a data serialization system • Zoo Keeper – helps build distributed applications • Chukwa – data collection and analysis
Hadoop – Related Projects • Hue – Hadoop user interface • Oozie – work flow scheduler • Hama – bulk synchronous parallel framework • For massive scientific computations • Nutch – web crawler • Hbase – Non relational database
Hadoop – Large Users • Yahoo • 10,000 core Linux cluster • Facebook • 100 Petabytes, growing at .5 Petabytes a day • Amazon • Its possible to run Hadoop on Amazon's EC2 and S3
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems