NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

NCCloud: A Network-Coding-Based Storage System in a Cloud-of-Clouds Henry C. H. Chen Yuchong Hu Patrick P. C. Lee Yang Tang IEEE Transactions on Computers, 15 August 2013

Outline • Introduction • Repair in Multiple Cloud Storage • FMSR Codes • NCCloud • Conclusion

Introduction • Cloud storage provides an on-demand remote backup solution. • A single cloud storage provider encounters the problem such as a single point of failure.

Introduction • The general solution is to distribute data across different cloud providers. • stripe data • The fault-tolerance can be improved by the diversity of multiple clouds.

Introduction-Data Failure • This paper focuses on unexpected permanent cloud failure. • a cloud fails permanently => activate repair. • maintain data redundancy and fault-tolerance. • A repair operation • retrieves data from existing surviving clouds. • reconstructs the lost data in a new cloud.

Introduction-Data Failure • During repair, each surviving node • encode its stored data chunks. • send the encoded chunks to a new node • Regenerate the lost data.

Introduction-Cost Problem • Today’s cloud storage providers charge users for outbound data. • While repairing failures, moving the enormous amount of data (repair traffic) can introduce significant monetary costs.

Introduction-Repair Traffic Problem • In order to minimize repair traffic problem, regenerating codes [16] have been proposed. • store data redundantly in a distributed storage system. • require less repair traffic, but with the same fault-tolerance level. [16] Network Coding for Distributed Storage Systems

Introduction-Regenerating Codes • But, most existing regenerating codes require storage nodes • equip with computation capabilities. • perform encoding operations during repair.

Introduction-Regenerating Codes • In order to make regenerating codes portable to any cloud storage service. • This paper considers only a thin-cloud interface where storage nodes only support read/write.

Introduction-NCCloud • In this paper, we present the design and implementation of NCCloud • a proxy-based storage system. • a fault-tolerant storage. • over multiple cloud storage providers.

Introduction-FMSR • On top of NCCloud, we propose the functional minimum-storage regenerating (FMSR) codes. • The FMSR code implementation • maintain double-fault tolerance. • maintain the same storage cost as in RAID-6 • less repair traffic when recovering a single-cloud failure.

Introduction-FMSR • FMSR codes are non-systematic • the encoded chunks was formed by linear combination of the original data chunks. • not keep the original data chunks as in systematic coding schemes.

Repair in Multiple Cloud Storage • Transient failure • is short-term, such that the failed cloud will return to normal after some time and no outsourced data is lost.

Repair in Multiple Cloud Storage • Permanent failure • is long-term, in the sense that the outsourced data on a failed cloud will become permanently unavailable. • example : • data center outages in disasters. • data loss and corruption. • malicious attacks.

Outline • Introduction • Repair in Multiple Cloud Storage • FMSR Codes • Motivation • Implementation • NCCloud • Conclusion

Motivation • This paper considers • distributed • multiple-cloud storage • data is striped • proxy-based design

Motivation

Fault-tolerant • Maximum Distance Separable property • (n, k)-MDS code • divide file into equal-size native chunks. • linearly combined to form code chunks. • distribute over n (larger than k) nodes. • reconstruct original file from any k of the n nodes. • tolerate the failures of any n − k nodes.

Fault-tolerant • The FMSR codes can reconstruct the data of failed node from the surviving nodes. • download less data. • not reconstruct the whole file.

Different Coding Schemes Storage size 2M Repair traffic M Storage size 2M Repair traffic 0.75M Storage size 2M Repair traffic 0.75M

Double-fault Tolerant FMSR Codes • divide a file Minto 2(n − 2) native chunks. • generate 2n code chunks. • each node store two code chunks of size. • repair a failed node, repair traffic is . • RAID-6 codes, total storage size is , repair traffic is M. 50% saved

Outline • Introduction • Repair in Multiple Cloud Storage • FMSR Codes • Motivation • Implementation • NCCloud • Conclusion

FMSR Codes Implementation • FMSR codes do not require lost chunks to be exactly reconstructed • not identical to those in the failed node. • As long as the MDS property holds.

FMSR Codes Implementation • This paper propose a two-phase checking scheme to ensure the code chunks on all nodes always satisfy the MDS property.

FMSR Codes Implementation • Theimplementation assumes a thin-cloud interface. • File upload • File download • Repair

File Upload • Native chunks : • Code chunks : • Encoding matrix of coefficients : • size • in the Galois field GF(pn)

File Upload • Galois field GF(pn) Encoding coefficient vector

File Download • Download the k(n−k) code chunks from any k of the n storage nodes. • The ECVs of the k(n−k) code chunks can form a k(n−k)×k(n−k) square matrix. • Obtain the original k(n − k) native chunks. • multiply the inverse of the square matrix with the code chunks.

Iterative Repair • MDS property must hold even after iterative repairs. • This paper proposes a two-phase checking. • MDS property • rMDS property

Satisfy MDS, but not rMDS

Iterative Repair Step 1. Download the encoding matrix from a surviving node. Step 2. Select one ECV from each of the n-1 surviving nodes. Step 3. Generate a repair matrix . Step 4. Compute the ECVs for the new code chunks and reproduce a new encoding matrix.

Iterative Repair Step 5. Given EM’, verify if those properties are satisfied. • verify MDS by enumerating all . • verify rMDS by n(n−k)n-1. • The corresponding encoding matrices must form a full rank. Step 6. Download the actual chunk data and regenerate new chunk data. • Step 4 : The new ECVs • Code chunks from surviving nodes

rMDS Sustaining

Time of Two-phase Checking

Double-fault Tolerant Codes • Markov Model

MTTDL, Compare to RAID-6 Mean Time To Data Loss

NCCloud • A proxy that bridges user applications and multiple clouds. • Its design is built on three layers. • File system layer • Coding layer • Storage layer

NCCloud • It is mainly implemented in Python, while the coding schemes are implemented in C for better efficiency.

Goal of NCCloud • Compare the costs and response time of using RAID-6 and FMSR codes. • The cost advantage of FMSR over RAID-6, while maintaining acceptable response time.

Goal of NCCloud • Normal operations • RAID-6 and FMSR incur similar storage costs. • Repair operation • FMSR save a significant amount of transfer costs over RAID-6.

Cost Saving-Price

Cost Saving • Normal operations • 1.25PB of data stored • FMSR : $86,851 monthly storage cost • RAID-6 : $86,851 monthly storage cost • Repair operation • RAID-6 : 1PB of data, $56,832 • FMSR : 0.5625PB of data, $33,894 Savingof $ 22,938

Response Time-Local Cloud

Response Time-Commerical Cloud

Conclusion • This paper present NCCloud providing the reliability of today’s cloud backup storage. • proxy-based • multiple-cloud storage system • NCCloud not only provides fault tolerance in storage, but also allows cost-effective repair. • The FMSR code implementation eliminates the encoding requirement of storage nodes during repair.

NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

NCCloud : A Network-Coding-Based Storage System in a Cloud-of-Clouds

Presentation Transcript

Cassandra Structured Storage System over a P2P Network

NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System

NCCloud : Applying Network Coding for the Storage Repair in a Cloud-of-Clouds

Symform Cloud Storage Network

Cloud-Based Multimedia Storage

Enabling Data Integrity Protection in Regenerating-Coding-Based Cloud Storage

An Update Model for Network Coding in Cloud Storage Systems

Convergent Dispersal: Toward Storage-Efficient Security in a Cloud-of-Clouds

Network Coding Distributed Storage

Network Coding for Distributed Storage Systems

Simple Regenerating Codes: Network Coding for Cloud Storage

Network Coding and Distributed Storage

A COMPARISON OF CLOUD STORAGE SERVICES

Quality of Service Provision in Cloud-based Storage System f

DEPSKY Dependable and Secure Storage in a Cloud-of-Clouds

Secure Cloud Storage meets with Secure Network Coding

Cloud Based Network

PCSPOS: A Cloud Based iPad POS System

Best Cloud Storage System

Cloud Based Storage System with Integrated Microsoft Azure Cloud Service

Cloud Based Storage Market

Top 7 Benefits of a Cloud Based System