260 likes | 376 Views
This presentation explains about Introduction of Big Data with Hadoop.
E N D
BY International School of Engineering {We Are Applied Engineering} ENGINEERING BIG DATA WITH HADOOP Disclaimer: Some of the Images and content have been taken from multiple online sources and this presentation is intended only for knowledge sharing but not for any commercial business intention
OVERVIEW • WHAT IS BIG DATA? • EXPLOSION OF DATA • DATA CONTRIBUTIONS • DATA EXPLOSION • WHO ARE THE PLAYERS? • BIG DATA–BIG PICTURE– LANDSCAPE • BIG DATA– ENTERPRISE ROLES • WHAT IS HADOOP? • EVOLUTION OF HADOOP • HADOOP ECOSYSTEM • HADOOP ECOSYSTEM MAP • HADOOP: 30,000 FEET VIEW • BIG DATA & ANALYTICS Case studies • VIDEO OF HADOOP ECOSYSYTEM
WHAT IS BIG DATA? • High-volume, high-velocity and high- variety information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making. -Gartner HIGH VOLUME HIGH VARIETY HIGH VELOCITY
Source: http://www.emc.com/leadership/digital-universe/iview/index.htm
Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf
Source: http://www.emc.com/collateral/about/news/idc-emc-digital-universe-2011-infographic.pdf
WHAT IS HADOOP? • Flexible Structured/Unstructured Text/Binary Schema/Schema less • 100% Open Source • Scalable – Petabytes of Data – Thousands of Nodes Source: http://cloudtimes.org/2013/06/25/hadoop-as-a-service-market-growing/
EVOLUTION OF HADOOP How does an Elephant Sneak up on you?
HADOOP ECOSYSTEM Chukwa Sqoop Zookeeper Pig Avno HBase Mahout Flume Map Reduce Engine Whirr Hama Hadoop Distributed File System Hive Hadoop Common
HADOOP ECOSYSTEM MAP Source: http://indoos.wordpress.com/2010/08/16/hadoop-ecosystem-world-map/
Hadoop Evolution – Map Explained! • How did it all start- huge data on the web! • Nutch built to crawl this web data • Huge data had to be saved- HDFS was born! • How to use this data? Map reduce framework built for coding and running analytics – java, any language-streaming (Hadoop streaming) • How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs – fuse,webdav, chukwa, flume, Scribe • Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!
Continued • High level interfaces required over low level map reduce programming– Pig, Hive, Jaql • BI tools with advanced UI reporting- drilldown etc- Intellicus • Workflow tools over Map-Reduce processes and High level languages: Oozie • Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia • Support frameworks- Avro (Serialization), Zookeeper (Coordination) • More High level interfaces/uses- Mahout, Elastic map Reduce • OLTP- also possible – Hbase
HADOOP: 30,000 FEET VIEW • Distribute data initially • Let processors / nodes work on local data • Minimize data transfer over network • Replicate data multiple times for increased availability • Write applications at a high level • Programmers should not have to worry about network programming, temporal dependencies, low level infrastructure, etc • Minimize talking between nodes (share-nothing)
Case Studies BIG DATA & ANALYTICS
For Detailed Description of HADOOP ECOSYSTEM components checkout our video on
International School of Engineering Plot no 63/A, 1st Floor, Road No 13, Film Nagar, Jubilee Hills, Hyderabad-500033 For Individuals (+91) 9502334561/62 For Corporates (+91) 9618 483 483 Facebook: www.facebook.com/insofe Slide share: www.slideshare.net/INSOFE