100 likes | 114 Views
Explore the impact of big data on IT industry, the introduction of Hadoop, HDFS, and MapReduce, and prominent users of Hadoop. Learn how Hadoop addresses the complexities of processing large data sets efficiently.
E N D
The solution for Big data HADOOP J. Sai Krishna and G. Sravya Lahari 2nd B.Tech (CSE) K.O.R.M College of Engineering Kadapa
Contents • Data – trends in storing data. • Bigdata – problems in IT industry • Introduction to HADOOP • HDFS (Hadoop Distributed File System) • MapReduce • Prominent users of Hadoop. • Conclusion
Data – trends in storing data • What is data--- Any real world symbol (character, numeric, special character) or a of group of them is said to be data it may be of the visual or audio or scriptural ,etc
Big data • What is big data—In IT, it is a collection of data sets so large and complex data that it becomes difficult to process using on-hand database management tools or traditional data processing applications. • As of 2012, limits on the size of data sets that are feasible to process in reasonable time were on the order of Exabyte of data.
BIGDATA and problems with it. • Daily about 0.5 Petabytes of updates are being made into FACEBOOK including 40 millions photos. • Daily, YOUTUBE is loaded with videos that can be watched for one year continuously • Limitations are encountered due to large data sets in many areas, including meteorology, genomics, complex physics simulations, and biological and environmental research. • Also affect Internet search, finance and business informatics. • The challenges include in capture, retrieval, storage,search, sharing, analysis, and visualization.
HADOOP THEN WHAT COULD BE THE SOLUTION FOR BIGDATA
What is Hadoop? • It is a opensource software written in java • Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
The project includes these modules: • Hadoop Common • Hadoop Distributed File System (HDFS) • Hadoop MapReduce
1.Hadoop Commons • It provides access to the filesystems supported by Hadoop. • The Hadoop Common package contains the necessary JAR files and scripts needed to start Hadoop. • The package also provides source code, documentation, and a contribution section which includes projects from theHadoop Community (Avro, Cassandra, Chukwa, Hbase, Hive, Mahout, Pig, ZooKeeper)
Interesting, right? This is just a sneak preview of the full presentation. We hope you like it! To see the rest of it, just click here to view it in full on PowerShow.com. Then, if you’d like, you can also log in to PowerShow.com to download the entire presentation for free.