Problem-solving on large-scale clusters: theory and applications

Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up

Today’s Outline • File Systems Overview • Uses • Common distributed file systems • GFS • Goals • System design • Consistency model • Performance • Introduction to Distributed Systems • MapReduce as a distributed system

File System Uses Uses: • Persistence of data • Inter Process Communication • Shared Namespace of Objects Requirements: • Lots of files • Random Access • Permissions • Consistency

Common Distributed FSs Traditional: • NFS • SMB • AFS What about: • Kazaa • BitTorrent • Keberos

GFS Goals Common usage and environment patterns: • Regular Component Failure • Few, Large, Multi-GB files • Reads are streaming • Writes are appends • Control over client implementation What are the consequences on… • data caching? • latency and bandwidth balance? • consistency? • control of data flow?

GFS System Design System attributes and components: • Single master controls file namespace • Data broken in 64Meg “Chunks” • Chunks replicated over many “Chunk Servers” • Client talks directly to chunk servers for data From GFS paper

GFS Write flow Control flow • Client get chunk list from master • Master responds with primary/secondary chunk servers • Client starts pipelining data to chunk servers • Client asks primary chunk servers to notify finish • Primary chunk server signals write ordering to replicated servers • Chunk servers responds with successful commit • Client notified of good write From GFS paper

Consequence of design Questions: • Where are the bottlenecks? • What if a replica fails? • What if the primary fails? • What if the master fails? • Why do writes need to be ordered? From GFS paper How do you work around these issues?

GFS Consistency: terms Two new terms Consistent: All chunk servers have the same data Defined: The result of the “last write” is fully available A defined chunk is also a consistent chunk Questions: • Is data corrupt if it is inconsistent? • Is data corrupt if it is undefined? • Can applications use data in either state?

GFS Consistency: mutations Consistency for types of writes: • Single random write • Consistent and Defined • Single append • Consistent and Defined • Concurrent random write • Consistent • Aborted write • Inconsistent • Concurrent append • Final location is defined

GFS Single Master handling Single Master = bottleneck = SPF • Master persists changes to multiple replicas • Can delegate to “shadow masters” • Naming done via DNS (easy failover) How does this work: • If the network is partitioned? • Over multiple data centers?

GFS Performance Here’s some random perf numbers • 1MB replicates in about 80ms • 342 node cluster • 72 TB Avail, 55 Used • 735 K Files, 22 K dead files, 992 K Chunks • 13GB Chunk server Meta data • 48MB Master Meta data • Read rate: ~580 MB/s • Write rate: ~2 MB/s • Master ops: ~320 Op/s 100Mbit full duplex with Gigabit backbone ` Data from gfs paper

GFS Performance Cont’d More random perf numbers • 227 node cluster • 180 TB Avail, 155 Used • 737 K Files, 232 K dead files, 1550 K Chunks • 21GB Chunk server Meta data • 60MB Master Meta data • Read rate: ~380 MB/s • Write rate: ~100 MB/s • Master ops: ~500 Ops/s Data from gfs paper

And now … • An overview of the concepts we’ve been alluding to all quarter: parallel and distributed systems

Parallel vs Distributed Computing • Parallel computing • Dividing a problem into identical tasks to be executed at the same time on multiple machines or threads • Distributed computing • Dividing a problem into (possibly identical) tasks to be executed on multiple machines or threads, but generally on machines separated by a network • Parallel computing is often a limited form of distributed computing Is the MapReduce programming model parallel or distributed?

Requirements: Parallel Computing • Requirement: minimal (to no) data dependencies! • Also nice: static data source • Nice to have: minimal communication overhead • Replicating state • Coordinating / scheduling tasks (eg, administrative overhead)

Requirements: Distributed System • See worksheet / activity

Distributed System Design (1 of 2) • From studying large (but not necessarily distributed) systems, we know that distributed systems trade off one guarantee for another • Ergo, you need to know what you’re designing for, eg use-cases • From studying MapReduce, we know that successful distributed system minimize data dependencies and administrative communication

Distributed System Design (2 of 2) • From MapReduce & GFS, we know that a distributed system must assume that components fail • Network may not be reliable • Latency can be high • Bandwidth is not infinite • Data may get corrupted • Attacks from malicious users • Hardware failures: motherboards, disks, and memory chips will fail • … etc …

Problem-solving on large-scale clusters: theory and applications

Problem-solving on large-scale clusters: theory and applications

Presentation Transcript

Problem Solving

Problem Solving

Guide to Problem Solving

Problem Solving

About me

The districting problem: applications and solving methods

Solving Large-scale Eigenvalue Problems in SciDAC Applications

Problem Solving and Situated Cognition

Problem Solving

Questioning, Thinking, and Problem Solving

MapReduce : Simplified Data Processing on Large Clusters

Problem-Solving Session 1

Problem Solving

Observations of Large Scale Structure: Measures of Galaxy Clustering

New (iterative) methods for solving the nuclear eigenvalue problem Pisa 05

Two approximate approaches for solving the large-scale shell model problem

Architectural Support for System Software on Large-Scale Clusters

Detection of a Large-Scale Structure of Intracluster Globular Clusters in the Virgo Cluster

Real Life Applications of GCF and LCM

Problem-Solving

Sorrento: A Self-Organizing Distributed File System on Large-scale Clusters

Large Scale Applications