300 likes | 1.13k Views
( Hadoop Training: https://www.edureka.co/hadoop ) <br>This What is HDFS PPT will help you to understand about Hadoop Distributed File System and its features along with practical. In this What is HDFS PPT, we will cover: <br><br>1. What is DFS and Why Do We Need It? <br>2. What is HDFS? <br>3. HDFS Architecture <br>4. HDFS Replication Factor <br>5. HDFS Commands Demonstration on a Production Hadoop Cluster <br><br>Check our complete Hadoop playlist here: https://goo.gl/hzUO0m <br><br>Follow us to never miss an update in the future. <br>Instagram: https://www.instagram.com/edureka_learning/ <br>Facebook: https://www.facebook.com/edurekaIN/ <br>Twitter: https://twitter.com/edurekain <br>LinkedIn: https://www.linkedin.com/company/edureka
E N D
Hadoop: HDFS & MapReduce Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Hadoop: HDFS and MapReduce Hadoop is a framework that allows us to store and process large data sets in parallel and distributed fashion Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What is DFS? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What is DFS? Distributed File System Local File System 1 TB 1 TB 4 TB 1 TB 1 TB 1 TB Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Why DFS? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Why DFS? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Why DFS? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What is HDFS? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What is HDFS? HDFS is a distributed file system that allows you to store large data across the cluster Who is Who distributes the data across the cluster? responsible for managing the data? 1 TB 1 TB How is data accessed? 1 TB 1 TB Copyright © 2017, edureka and/or its affiliates. All rights reserved.
HDFS Architecture Copyright © 2017, edureka and/or its affiliates. All rights reserved.
HDFS Architecture NameNode: Master Node NameNode ▪ Master daemon ▪ Maintains and Manages DataNodes ▪ Records metadata ▪ Receives heartbeat and block report from all the DataNodes DataNode ▪ Slave daemons ▪ Stores actual data ▪ Serves read and write requests from the clients DataNodes: Slave Nodes Copyright © 2017, edureka and/or its affiliates. All rights reserved.
How Files are Stored in HDFS? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
HDFS Data Blocks ➢ Each file is stored on HDFS as blocks ➢ The default size of each block is 128 MB in Apache Hadoop 2.x (64 MB in Apache Hadoop 1.x) Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What if DataNode Containing Data Crashes? Copyright © 2017, edureka and/or its affiliates. All rights reserved.
DataNode Failure Scenario: One of the DataNodes crashed containing the data blocks Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Solution: Replication Factor Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Replication Factor Solution: Each data blocks are replicated (thrice by default) and are distributed across different DataNodes Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Demo Copyright © 2017, edureka and/or its affiliates. All rights reserved.