FAWN: Fast Array of Wimpy Nodes

FAWN: Fast Array of Wimpy Nodes A technical paper presentation in fulfillment of the requirements of CIS 570 – Advanced Computer Systems – Fall 2013 Scott R. Sideleau ssideleau@umassd.edu 14-Nov-2013

Overview • Identify the problem space • FAWN as a solution • Architecture principles • Unique key-value storage • Evaluate and benchmark a 21-node FAWN cluster • Identify when FAWN makes sense

Theoretical Problem Space • CPU I/O gap • Modern processors are so efficient that a lot of time is spent idle • CPU power consumption scales linearly • Increased caches to keep the superscalar pipelines fed is a driver • Dynamic Voltage Frequency Switching (DVFS) is inefficient • Intel SpeedStep technology • CPU still operates generally at 50% power consumption

What’s the real problem? • Electricity is expensive! • Home usage is measured in KW, data center usage in MW • Facebook use up to $1 million a month in electricity • Only three data centers! • Oregon, USA • Virginia, USA • Sweden

Facebook’s Not Playing Around • Fourth data center to be powered by renewable wind • Iowa, USA http://goo.gl/sFmmxz dtd 14-Nov-2013

Proposed Solution • Fast Array of Wimpy Nodes (FAWN) • Bridge the I/O gap • Use slower CPUs and faster Flash storage • Reduce power consumption per node • Embedded CPUs consume significantly less power • Address distributed storage for the new architecture • New key-value storage system (FAWN-KV) • Complementary per node data store (FAWN-DS)

System Architecture

Basic Functions

Replication & Consistency

Understanding Flash Storage • Fast random reads • 175x faster than HDDs • Vary wildly between make/models • Efficient I/O • Very low power • High query per Joule rate vs. HDDs • Slow random writes • Expensive erase/write cycle • Motivation for log structured (i.e. sequential) data storage

Optimized Maintenance Functions • Split • Used when adding a node to the cluster • Read, then sequential write to two new data stores if key is in range • Merge • Used when deleting a node from the cluster • Mutually exclusive stores, so append one data store to the other • Compact • Cleans up entries in a data store • Skip orphans, out-of-range, deleted and write to new data store

Optimized Sequential Read & Writes

Front-end Consistent Hashing

Node Join

Node Leave • Rather than splitthe data stores, nodes merge them • In reality, this means… • Add a new replica into each chain the departing node belonged to • So, the processing is the same as a join event

Failure Detection • Nodes are assumed to be fail-stop • Front-end and back-end nodes gossip at a known rate • If timeout, front-end initiates leave operation for failed node • Current design only copes with node failures • Coping with network failures require future work

Single Node Evaluation • Performance almost entirely dependent on flash media

21-Node Evaluation • In general, the back-ends prove to be well-matched

21-Node Evaluation • Relatively responsive through maintenance operations

21-Node Evaluation • Slightly slower than production key-value systems • Worst case response times on-par

21-Node Evaluation • Power draw is low and consistent across operations

21-Node Evaluation • Power draw is low and consistent across operations • Query per Joule is an order of magnitude higher than traditional production distributed systems • 1 billion instructions per Joule • 1/3 the frequency • 1/10 (or less) the power

When does FAWN matter? • It depends on the workload…

Thanks very much! Questions?

FAWN: Fast Array of Wimpy Nodes