1 / 36

The Google File System

The Google File System. Sanjay Ghemawat , Howard Gobioff , and Shun- Tak Leung Google* 정학수 , 최주영. Outline. Introduction Design Overview System Interactions Master Operation Fault Tolerance and Diagnosis Conclusions. Introduction.

lobo
Download Presentation

The Google File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google* 정학수, 최주영

  2. Outline • Introduction • Design Overview • System Interactions • Master Operation • Fault Tolerance and Diagnosis • Conclusions

  3. Introduction • GFS was designed to meet the demands of Google’s data processing needs. • Emphasis on Design • Component failures • Files are huge • Most files are mutated by appending

  4. DESIGN OVERVIEW

  5. Assumptions • Composed of inexpensive components often fail • Stores 100 MB or larger size file • Large streaming reads, small random reads • Large, sequential writes that append data to files. • Atomicity with minimal synchronization overhead is essential. • High sustained bandwidth is more important than low latency

  6. Interface • Files are organized hierarchically in directories and identified by pathnames

  7. Architecture • Google File System. Designed for system-to-system interaction, and not for user-to-system interaction.

  8. Single Master

  9. Chunk Size • Large chunk size – 64MB • Advantages • Reduce client-master interaction • Reduce network overhead • Reduce the size of metadata • Disadvantages • Hot spot - Many clients accessing the same file

  10. Metadata • All metadata is kept in master’s memory • Less than 64bytes metadata each chunk • Types • File and chunk namespace • File to chunk mapping • Location of each chunk’s replicas

  11. Metadata(Cont’d) • In-Memory data structure • Master operations are fast • Easy and efficient periodically scan • Operation log • Contain historical record of critical metadata changes • Replicate on multiple remote machines • Respond to client only after log record • Recovery by replaying the operation log

  12. Consistency Model • Consistent • all clients will always see the same data regardless of which replicas they read from • Defined • consistent and clients will see what mutation writes in its entirety • Inconsistent • different clients may see different data at different times

  13. SYSTEM INTERACTION

  14. Leases and Mutation Order • Leases • To maintain a consistent mutation order across replicas and minimize management overhead • The master grants one of the replicas to become the primary • Primary picks a serial order of mutation • When applying mutation all replicas follow the order

  15. Leases and Mutation Order(Cont’d)

  16. Data Flow • Fully utilize network bandwidth • Decouple control flow and data flow • Avoid network bottlenecks and high-latency • Forwards the data to the closest machine • Minimize latency • Pipelining the data transfer

  17. Atomic Record Appends • Record append : atomic append operation • Client specifies only the data • GFS appends data at an offset of GFS’s choosing and return that offset to client • Many clients append to the same file concurrently • such files often serves as multiple-producer/ single-consumer queue • Contain merged results

  18. Snapshot SNAPSHOT Make a copy of a file or a directory tree Standard copy-on-write

  19. MASTER OPERATION

  20. Namespace Management and Locking • Namespace • Lookup table mapping full pathname to metadata • Locking • To ensure proper serialization multiple operations active and use locks over regions of the namespace • Allow concurrent mutations in the same directory • Prevent deadlock consistent total order

  21. Replica Placement • Maximize data reliability and availability • Maximize network bandwidth utilization • Spread replicas across machines • Spread chunk replicas across the racks

  22. Creation, Re-replication, Rebalancing • Creation • Demanded by writers • Re-replication • Number of available replicas fall down below a user-specifying goal • Rebalancing • For better disk space and load balancing

  23. Garbage Collection • Lazy reclaim • Log deletion immediately • Rename to a hidden name with deletion timestamp • Remove 3 days later • Undelete by renaming back to normal • Regular scan • Heartbeat message exchange with each chunkserver • Identify orphaned chunks and erase the metadata

  24. Stale Replica Detection • Maintain a chunk version number • Detect stale replicas • Remove stale replicas in regular garbage collection

  25. FAULT TOLERANCE AND DIAGNOSIS

  26. High Availability • Fast recovery • Restore state and start in seconds • Chunk replication • Different replication levels for different parts of the file namespace • Master clones existing replicas as chunkservers go offline or detect corrupted replicas through checksum verification

  27. High Availability • Master replication • Operation log and checkpoints are replicated onmultiple machines • Master machine or disk fail • Monitoring infrastructure outside GFS starts new master process • Shadow master • Read-only access when primary master is down

  28. Data Integrity • Checksum • To detect corruption • Every 64KB block in each chunk • In memory and stored persistently with logging • Read • Chunkserver verifies checksum before returning • Write • Append • Incrementally update the checksum for the last block • Compute new checksum

  29. Data Integrity(Cont’d) • Write • Overwrite • Read and verify the first and last block then write • Compute and record new checksums • During idle periods • Chunkservers scan and verify inactive chunks

  30. MEASUREMENTS

  31. Micro-benchmarks • GFS cluster • 1 master • 2 master replicas • 16 chunkservers • 16 clients • Server machines connected to one switch • client machines connected to the other • Two switches are connected with 1 Gbps link.

  32. Micro-benchmarks Figure 3: Aggregate Throughputs. Top curves show theoretical limits imposed by our network topology. Bottom curves show measured throughputs. They have error bars that show 95% confidence intervals, which are illegible in some cases because of low variance in measurements.

  33. Real World Clusters Table2: characteristic Of two GFS clusters

  34. Real World Clusters Table 3: Performance Metrics for Two GFS Clusters

  35. Real World Clusters • In cluster B • Killed a single chunk server containing 15,000 chunks (600GB of data) • All chunks restored in 23.2minutes • Effective replication rate of 440MB/s • Killed two chunk servers each 16,000 chunks (660GB of data) • 266 chunks only have a single replica • Higher priority • Restored with in 2 minutes

  36. Conclusions • Demonstrates qualities essential to support large-scale processing workloads • Treat component failure as the norm • Optimize for huge files • Extend and relax standard file system • Fault tolerance provide • Consistent monitoring • Replicating crucial data • Fast and automatic recovery • Use checksum to detect data corruption • High aggregate throughput

More Related