HDFS - Hadoop Overview 2-

HDFS-Hadoop Overview 2- 2009.01.20 유현정

Data Replication • HDFS’s blocks in a file except the last block are the same size. • The block size and replication factor are configurable per file. • The NameNode periodically receives a Heartbeat and a Blockreport from each of the DataNodes in the cluster. • DataNodes send Heartbeat to the NameNode. NameNode used Heartbeats to detect DataNode failure. • DataNode periodically sends a report of all existing blocks to the NameNode.

Replica Placement • For the common case, replication factor == 3 • One replica on one node in the local rack • Another on a different node in the local rack • The last on a different node in a different rack • If replication factor > 3, additional replicas are randomly placed

Replica Placement • Does not impact data reliability and availability guarantees. • However, it does reduce the aggregate network bandwidth used when reading data. (3개의 rack이 아닌, 2개의 rack에 데이터를 저장하기 때문) • Replicas of file은 공평하게 분배되지 않음 • This policy is a work in progress.

Replica Selection • To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader.

SafeMode • 시작 시, NameNode는 SafeMode 상태 • 데이터 block의 복제는 안전모드 상태일 때 발생하지 않음 • 안전하게 복제된 data block의 percentage를 점검한 후, 안전모드 상태에서 벗어남 • 명시된 replication factor보다 적은 data block의 list를 check • NameNode가 위 block들을 다른 데이터노드에복재함

NameNode Meta-data • The NameNode uses a tansaction log called the EditLog to persistently record every change that occurs to file system metadata. • E.g.) creating a file, deleting a file, or changing the replication factor of a file • The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. • EditLog & FsImage is stored as files in the NameNode’s local file system.

Checkpoint • When the NameNode starts up, • NameNode는 FsImage와 EditLog를 디스크로부터 읽고, EditLog로부터의 모든 transaction들을 FsImage에 적용한 뒤, 새로운 버전의 FsImage로 디스크에 저장 • EditLog의 transactions은 FsImage에 저장되었기 때문에버림 • 현재, checkpoint는 NameNode시작 시에만 발생 • 주기적으로 checkpointing을 지원하는 작업 구현 중

The communication protocol • Layered on top of the TCP/IP protocol • Client Protocol : client ↔ NameNode • DataNode Protocol : DataNodes↔ NameNode • A Remote Procedure Call(RPC) abstration wraps both the Client Protocol and the DataNode Protocol. • NameNode는 어떠한 RPC들도 초기화하지 않음 • NameNode는 DataNodes나 Clients에 발행된 요청에 대해서만 응답

Robustness • The three common types of failure • NameNode failures • DataNode failures • Network partitions

Data Disk Failure • A network partition can cause a subset of DataNodes to lose connectivity with the NameNode. • Using a Heartbeat message • The necessity for re-replication’s reasons • A DataNode may become unavailable like a dead DataNode • A replica may become corrupted • A hard disk on a DataNode may fail • The replication factor of a file may be increased

NameNode Failure • A single point of failure • 현재, 자동적인 재 시작과 다른 머신에 의한 NameNode software의 장애 극복은 지원되지 않음

Data Correctness/Integrity • Use Checksums to validate data • Use CRC32 • DataNode stores the checksum.

Snapshots • 특정 시점 순간의 사본을 저장하는 기능 • 현재는 지원 안함

Replication Pipelining • DataNode는 pipeline 내의 이전 DataNode로부터 데이터를 받는 동시에 Pipeline 내의 다음 DataNode로 전송 • The data is pipelined from one DataNode to the next.

File Deletes and Undeletes • 사용자나 application에 의해서 파일이 삭제되었을 때, 그 파일은 HDFS에서 바로 삭제되지 않음 • /trash 폴더의 파일로 먼저 이름 변경 • /trash 폴더에 있다면, 복원 가능 • 일정 시간 후, NameNode는 해당 파일을 Namespace에서 삭제 • 해당 파일과 그에 관련된 블록들의 해제

File Deletes and Undeletes • /trash 폴더는 삭제된 파일의 최근 사본을 갖고 있다. • /trash 폴더 안에 파일이 남아있다면, 그 파일을 삭제 후에도 취소 가능 • 현재, default policy : • 6시간 이상의 것들이 /trash 폴더에서 삭제

HDFS - Hadoop Overview 2-