RAMCloud Overview

RAMCloud Overview • Storage for datacenters • 1000-10000 commodity servers • 32-64 GB DRAM/server • All data always in RAM • Durable and available • Performance goals: • High throughput:1M ops/sec/server • Low-latency access:5-10µs RPC Application Servers Storage Servers Datacenter CS 142 Lecture Notes: Large-Scale Web Applications

Example Configurations For $100-200K today: • One year of Amazon customer orders • One year of United flight reservations CS 142 Lecture Notes: Large-Scale Web Applications

UI Bus.Logic RAMCloud Motivation: Latency Traditional Application Web Application • Large-scale apps struggle with high latency • Facebook: can only make 100-150 internal requests per page UI Application Servers App.Logic Storage Servers DataStructures Single machine Datacenter << 1µs latency 0.5-10ms latency CS 142 Lecture Notes: Large-Scale Web Applications

UI Bus.Logic RAMCloud Motivation: Latency Traditional Application Web Application • RAMCloud goal: large scale and low latency • Enable new class of applications: • Crowd-level collaboration • Large-scale graph algorithms UI Application Servers App.Logic Storage Servers DataStructures Single machine Datacenter 0.5-10ms latency << 1µs latency 5-10µs CS 142 Lecture Notes: Large-Scale Web Applications

RAMCloud Motivation: Technology Disk access rate not keeping up with capacity: • Disks must become more archival • More information must move to memory CS 142 Lecture Notes: Large-Scale Web Applications

RAMCloud Research Issues • Data durability/availability • Fast RPCs • Data model, concurrency/consistency model • Data distribution, scaling • Automated management • Multi-tenancy • Client-server functional distribution • Node architecture CS 142 Lecture Notes: Large-Scale Web Applications

DRAM DRAM DRAM disk disk disk Data Durability/Availability • Data must be durable and available when write RPC returns • Unattractive approaches: • Replicate in other memories (too expensive) • Synchronous disk write (100-1000x too slow) • Our approach: buffered logging write log log Storage Servers async, batch CS 142 Lecture Notes: Large-Scale Web Applications

Buffered Logging,cont’d • Potential problem: power loss • Per-server battery backup? • Nonvolatile memory on disk controllers? • Potential problem: crash recovery • If master crashes, data unavailable until recovered from disks on backups • Read 64 GB from one disk? 10 minutes • Our goal: recover in 1-2 seconds • Solution: take advantage of system scale • Scatter backup data across many servers • Recover in parallel CS 142 Lecture Notes: Large-Scale Web Applications

Recovery, First Try • Scatter log segments randomly across all servers • After crash, all backups read disks in parallel(64 GB/1000 backups @ 100 MB/sec = 0.6 sec) • Collect all backup data on replacement master(64 GB/1GB/sec ~ 60 sec: too slow!) ReplacementMaster ... Backups CS 142 Lecture Notes: Large-Scale Web Applications

Recovery, Second Try • Divide each master's data into partitions • Recover each partition on a separate server: • 100 partitions, 640 Mbytes each • 1 GB/sec NIC per replacement master • Recovery time < 1 sec DeadMaster ReplacementMasters ... Backups CS 142 Lecture Notes: Large-Scale Web Applications

CS 142 Lecture Notes: Large-Scale Web Applications

RAMCloud Overview

RAMCloud Overview

Presentation Transcript

Overview

Overview

OVERVIEW

Overview

Overview

Overview

Overview

Overview

overview

Overview

Overview

Overview

RAMCloud: Scalable High-Performance Storage Entirely in DRAM

Overview

Overview

Overview

Overview

Overview

Overview

OVERVIEW

RAMCloud Scalable High-Performance Storage Entirely in DRAM

Fast Crash Recovery in RAMCloud