1 / 25

Sheepdog: Y et A nother A ll- I n- O ne S torage F or Openstack

Sheepdog: Y et A nother A ll- I n- O ne S torage F or Openstack. Openstack Hong Kong Summit Liu Yuan 2013.11.8. Who I Am. Active contributor to various open source projects such as Sheepdog, QEMU, Linux Kernel, Xen, Openstack, etc.

kami
Download Presentation

Sheepdog: Y et A nother A ll- I n- O ne S torage F or Openstack

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sheepdog: Yet Another All-In-One Storage For Openstack Openstack Hong Kong Summit Liu Yuan 2013.11.8

  2. Who I Am • Active contributor to various open source projects such as Sheepdog, QEMU, Linux Kernel, Xen, Openstack, etc. • Primary top contributor of Sheepdog project and co-maintains it with Kazutaka Morita from NTT Japan since 2011.1 • Technically lead the storage projects based on Sheepdog for internal uses of www.taobao.com • Contacts • Email: namei.unix@gmail.com • Micro Blog: @淘泰来

  3. Agenda Introduction - Sheepdog Overview Exploration - Sheepdog Internals Openstack - Sheepdog Goal Roadmap - Features From The Future Industry - How Industry Use Sheepdog

  4. Sheepdog Overview Introduction

  5. What Is Sheepdog Distributed Object Storage SystemIn User Space Manage Disks and Nodes Aggregate the capacity and the power (IOPS + throughput) Hide the failure of hardware Dynamically grow or shrink the scale Manage Data Provide redundancy mechanisms (replication and erasure code) for high-availability Secure the data with auto-healing and auto-rebalanced mechanisms Provide Services Virtual volume for QEMU VM, iSCSI TGT (Perfectly supported by upstream) RESTful container (Openstack Swift and Amazon S3 Compatible, in progress) Storage for Openstack Cinder, Glance, Nova (Available for Havana)

  6. Sheepdog Architecture No meta servers! Zookeeper: membership management and message queue Node event: global rebalance Disk event: local rebalance Global Consistent Hash Ring and P2P sheep daemon Gateway Gateway Gateway Store Store Store dog admin tool Private Hash Ring 1TB 1TB 1TB 1TB 1TB 2TB 4TB 2TB 1TB X Auto unplugged on EIO Hot-plugged

  7. Why Sheepdog • Minimal assumptions of underlying kernel and file system • Any type of file systems that support extended attribute(xattr) • Only require kernel version >= 2.6.32 • Full of features • snapshot, clone, incremental backup, cluster-wide snapshot, discard, etc. • User-defined replication/erasure code scheme on VDI(Virtual Disk Image) basis • Auto node/disk management • Easy to set up the cluster with thousands of nodes • Single daemon can manage unlimited number of disks in one node as efficient as RAID0 • as many as 6k+ for a single cluster • Small • Fast and very small memory footprint (less than 50 MB even when busy) • Easy to hack and maintain, 35K lines of code in C as of now

  8. Sheepdog Internals Exploration

  9. Sheepdog In A Nutshell 1TB Gateway Store Data Replicated or Erasure coded 256G SSD 4G 1TB /dev/vda Net 1TB Store Journal 1TB Weighted Nodes & Disks /dev/vda Shared Persistent Cache 2TB

  10. Sheepdog Volume • Copy-On-Write Snapshot • Disk-only snapshot, disk & memory snapshot • Live snapshot, offline snapshot • Rollback(tree structure), clone • Incremental backup • Instant operation, only create 4M inode object • Push many logics into client -> simple and fast code ! • Only 4 opcodes for store, read/write/create/remove, snapshot is done by QEMU block driver or dog • Requests serialization is not handled by Sheepdog but client • Inode object is treated the same as data object

  11. Req Queue Sheep Sheep 4TB 2TB 2TB 1TB 1TB Gateway - Request Engine Return Only All Succeed Net Node Ring Route Retry Socket Cache Sheep Cache X Req Manager Concurrent Handling Store Retry On Error(Timeout, EIO) Sheep X Degraded to GW-only On EIO

  12. Sheep Sheep Sheep Sheep Journal Store - Data Engine GW Disk Ring Net Disk Manager Auto unplugged on EIO Fake network err to ask GW retry Update disk ring Start local data rebalance 2TB 2TB X 1TB 1TB

  13. Redundancy Scheme Erasure Coding Full Replication Parity Sheep Sheep Sheep Sheep Sheep Sheep Sheep Sheep Sheep

  14. Erasure Coding Over Full Replication • Advantages • Far less storage overhead • Rumors breaking • Better R/W performance • Support random R/W • Can run VM Images ! • Disadvantages • Generate more traffic for recovery • (X + Y)/(Y + 1) times data ( Suppose X data, Y parity strips) • 2 times data for 4:2 compared to 3 full copies

  15. Recovery - Redundancy Repair & Data Rebalance If lost, read or rebuild from other copy Sleep Queue Net Node Ring Sheep Update Version Schedule Recovery Manager Req Queue Sheep Migrate to other node for rebalance X

  16. Recovery Cont. • Eager Recovery As Default • Allow users to stop it and do manual recovery temporarily • Node/Disk Events Handling • Disk event and node event share the same algorithm • Handle mixed node and disk events nicely • Subsequent event will supersede previous one • Handle group join/leave of disks and nodes gracefully • Recovery Handling Transparently to the Client • Put requests for objects being recovered on sleep queue and wake it up later • Serve the request directly if object is right there in the store

  17. Sheep Sheep Sheep Sheep Farm - Cluster Wide Snapshot Net Meta Data Generator Slicer 128K Hash Dedupper • Incremental backup • Up to 50% dedup ratio • Compression doesn't help Dedicated Backup Storage Think of Sheepdog On Sheepdog ? Yeah!

  18. Sheepdog Goal Openstack

  19. Openstack Storage Components • Cinder - Block Storage • Support since day 1 • Glance - Image Storage • Support merged at Havana version • Nova - Ephemeral Storage • Not yet started • Swift - Object Storage • Swift API compatible In progress • Final Goal - Unified Storage • Copy-On-Write anywhere ? • Data dedup ? Swift Cinder Nova Glance Unified Storage Sheep Sheep Sheep Sheep

  20. Features From The Future Roadmap

  21. Look Into The Future • RESTful Container • Plans to be Openstack Swift API compatible first, coming soon • Hyper Volume • 256PB Volume, coming soon • Geo-Replication • Sheepdog On Sheepdog • Storage for cluster wide snapshot • Slow Disk & Broken Disk Detecter • Deal with dead D state process hang because of broken disk in massive deployment

  22. How Industry Use Sheepdog Industry

  23. Sheepdog In Taobao & NTT VM VM SD SD SD SD SD SD SD SD SD SD SD SD SD SD VM VM SD SD LUN device pool HTTP VM running inside Sheepdog Cluster for test & dev at Taobao Ongoing project with 10k+ ARM nodes for cold data at Taobao Sheepdog cluster run as iSCSI TGT backend storage at NTT

  24. Other Users In Production Any more users I don't know ?

  25. Go Sheepdog ! Q & A Homepage http://sheepdog.github.io/sheepdog/ Try me out git clone git://github.com/sheepdog/sheepdog.git

More Related