60 likes | 154 Views
BIG DATA. HADOOP. Background. The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing
E N D
BIG DATA HADOOP
Background The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing • The volume of data being made publicly available increases every year. success in the future will be dictated to a large extent by their ability to extract value from other organizations’ data. • Variety of Data, Velocity of the Data , Volume of the data –V3 • Data Storage & Analysis • The storage capacity of the hard drives increased, but access speeds have not kept up significantly • Now 1 Terabyte data is norm for disks and speed is around 100 MB/s,so it takes more than two and half hours to read all the data from the disk. So there is a long time to read zetta bytes of data • So alternative solution—To read from multiple disks
Background • Data Storage & Analysis • Problems in reading from and writing to multiple disks • Multiple hardware pieces are prone for failure-So data loss probability is high • Solution for Data loss-Replication • RAID works with replication only • Data Analyis need to combine the data from various elements & challenges • Need a solution as reliable shared storage and analysis system • Hello ! Hadoop • NUTCH project by Doug Cutting • Google GFS & Map Reduce distributed data storage and processing • Yahoo Development Project • Doug Cutting Apache Hadoop Open source frame work • Hadoop-Made up Name
HADOOP ARCHITECTURE Top of Existing File System Streaming Data Access patterns Very large files Commodity Hardware High Through put rather than low latency 1) MAP 2) REDUCE 3) CODE for MR JOB 4) Automatic parallelization 5) Fault Tolerance Java,Python etc House keeping in built Lot of small files Low latency Data access Multiple Writes,
HDFS • HDFS block size 64 MB -128 MB • Why is it so large? Client Name Node Secondary Name Node Heart Beating, Block replication and Balancing Data node Data Node Data node Data node Data Nodes Data nodes