200 likes | 356 Views
The Google File System. Presentation by: Eric Frohnhoefer. Assumptions. Built from inexpensive commodity components Cheap components frequently fail Modest number of large files Few million files, each 100 MB or larger Support for large streaming reads and small random reads
E N D
The Google File System Presentation by: Eric Frohnhoefer CS5204 – Operating Systems
Assumptions • Built from inexpensive commodity components • Cheap components frequently fail • Modest number of large files • Few million files, each 100 MB or larger • Support for large streaming reads and small random reads • Files written once then appended • High sustained bandwidth favored over low latency CS5204 – Operating Systems
Design Decisions • Single master, multiple chunkservers • File structure • Fixed size 64MB chunks • Chunk divvied into 64K blocks • 32 bit checksum computer for each block • Each chunk replicated across 3+ chunkservers • Familiar interface • Create, delete, open, close, read, and write • Snapshot and record append • No caching CS5204 – Operating Systems
Architecture CS5204 – Operating Systems • Single Master • Manages namespace and locking • Manages chunk placement, creation, re-replication, and rebalancing • Garbage collection
Architecture • Chunkserver • Servers chunks to directly to client • Stores 64 MB chunks and checksums for each 64K block • Reports chunks contained on server to master • Verifies contents during idle periods CS5204 – Operating Systems
Metadata • Namespace • Logical mapping from files to locations on chunkserver • Kept up to date with heartbeat messages from chunkserver • Metadata stored in memory • Quick access • 64 bytes of metadata for each 64 MB chunk • Operations log • Historical record of changes made to metadata Dennis Kafura – CS5204 – Operating Systems
Consistency Model • States: • Consistent – all replicas have the same value • Defined – replica reflects the mutation • Namespace mutations are atomic and serializable • Client requires additional logic • Remove inconsistent records • Remove repeat records • Add checksums and unique identifies to records CS5204 – Operating Systems
Mutation Operation • Write operation: • Client requests location primary and secondary chunkserver. • Master assigns primary chunkserver and replies to client. • Client pushes all data to replicas. Data stored in LRU buffer. • Client sends write request to primary chunkserver. • Primary assigns serial number and forwards request to all secondary chunkservers. • Secondary servers reply to primary with operation status. • Primary replies to client with operations status. CS5204 – Operating Systems
Mutation Operation • Atomic record append: • Similar to O_APPEND mode in Unix without race condition due to multiple writers. • Record written at least once. • Same logic flow as write except primary appends the record and tells secondary chunkservers the exact location. • Used heavily by Google applications. CS5204 – Operating Systems
Mutation Operation • Snapshot operation: • Master receives snapshot request and revokes outstanding leases. • After leases revoked the master logs the operation. • In-memorycopy of file or directory metadata created. • Copy created on same chunkserver only when chunk is mutated. CS5204 – Operating Systems
Master’s Responsibilities /home/user /save/user • Snapshot: • Read lock acquired on /home and /save • Write lock acquired on /save/user and /home/user • Namespace management • Each entry has a associated read-write lock • Allows for concurrent mutations in same directory CS5204 – Operating Systems
Master’s Responsibilities • Periodic communications with chunkservers • Collect state, tracks cluster health • Replica placement • Maximize reliability and maximize bandwidth utilization • Distribute chunks between multiple racks • Chunk Creation • New replicas on chunkservers with below-average disk space utilization • Limit number of recent creations on chunkserver • Replicate across racks CS5204 – Operating Systems
Master’s Responsibilities • Re-replication • Occurs when number of replicas falls below user-specified goal • Re-replication is prioritized • Rebalance • Master examines the current replica distribution and moves replicas for better disk space and load balancing. • Garbage collection • Master logs deletion immediately • File is renamed a given a deletion timestamp • Files actually deleted later at user-specified date CS5204 – Operating Systems
High Availability • Fast recovery • Chunk replication • Default 3 replicas • Distribute across multiple racks • Shadow Master • Master state is fully replicated. • Mutations only committed once log has been written on all replicas. • Provides read-only access even when master is down Dennis Kafura – CS5204 – Operating Systems
Performance Cluster characteristics Cluster performance CS5204 – Operating Systems
Amazon S3 • RESTful and SOAP style interface • BitTorrent for distributed download • 99.999999999% durability and 99.99% uptime • Replicated 3 times across 2 datacenters • Cost • Storage: $0.14 / GB / Month • Bandwidth: $0.10 / GB • Requests: $0.01 / 1000 Requests • Permissions controlled by Access Control List (ACL) CS5204 – Operating Systems
Conclusions • Simple solution • Seamlessly handles hardware failures • Purpose built to Google’s needs • Large files • High read throughput • Record appends Dennis Kafura – CS5204 – Operating Systems
Reference Cluster Computing and MapReduce Lecture 3 http://www.youtube.com/watch?v=5Eib_H_zCEY http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FileSystems/GoogleFileSystem.pdf http://communication.howstuffworks.com/google-file-system.htm Dennis Kafura – CS5204 – Operating Systems