FAWN: A Fast Array of Wimpy Nodes

Presented by: Clint Sbisa & Irene Haque FAWN: A Fast Array of Wimpy Nodes

Motivation Large-scale data-intensive applications Facebook, LinkedIn, Dynamo CPU-I/O Gap storage, network and memory bottlenecks low CPU utilization CPU Power slower CPUs execute more queries per second per Watt 1 billion vs. 100 million instructions per Joule inefficient energy saving techniques Memory Power

FAWN Data-intensive, computational simple workloads Small objects - 100B - 1KB Cluster of embedded CPUs using flash storage Efficient Fast random reads Slow random writes FAWN-KV Key-value storage Consistent Hashing FAWN-DS Data store Log structured

FAWN - DS Log-structure key-value store Contains all values in a key range for each virtual ID Maps 160-bit key Hash Index bucket = i low order index bits key fragment = next 15 low order bits 6 byte in-memory Hash Index stores frag and pointer

Virtual Node Maintenance: Split Merge Compact FAWN - DS Basic Functions: Store Lookup Delete Concurrent operations

FAWN - KV Consistent hashing of back-end VIDs Management node assigns each front-end to circular key space Front-end nodes manages its key space forwards out-of-range request Back-end nodes - VIDs contacts front-end when joining owns a key range

FAWN - KV Chain replication

FAWN - KV Join split key range pre-copy chain insertion log flush Leave merge key range Join into each chain

Individual Node Performance • Lookup speed • Bulk store speed: 23.2 MB/s, or 96% of raw speed

Individual Node Performance • Put speed • Compared to BerkeleyDB: 0.07 MB/s – shows necessity of log-based filesystems

Individual Node Performance • Read- and write-intensive workloads

System Benchmarks • System throughput and power consumption

Impact of Ring Membership Changes • Query throughput during node join and maintenance operations

Impact of Ring Membership Changes • Query latency

Alternative Architectures • Large Dataset, Low Query → FAWN+Disk • Small Dataset, High Query → FAWN+DRAM • Middle Range → FAWN+SSD

Conclusion • Fast and energy efficient processing of random read-intensive workloads • Over an order of magnitude more queries per Joule than traditional disk-based systems

FAWN: A Fast Array of Wimpy Nodes

FAWN: A Fast Array of Wimpy Nodes

Presentation Transcript

Data Structures and Algorithms

Facility Access and Shipment Tracking (FAST) Overview

Fast Food and Obesity

Markov Random Fields

Fast-Track Surgery

Virtually Imaged Phased Array (VIPA): Operation and Applications

La Bamba: Baseball in April by Gary Soto

Facility Access and Shipment Tracking (FAST) – Overview Presentation

Approach to A child with cervical lymphadenopathy

Chapter 6 Arrays

Drill:

Chapter 7 Arrays and Array Lists

The Challenges of Using An Embedded MPI for Hardware-Based Processing Nodes

Graphs

Module 4 Current Array of Aboriginal Health Services

Module 4 Current Array of Aboriginal Health Services

Broadcast

the hash table

Sorting

Linked List

V9 - visualize cellular interaction data

Wimpy web writers