1 / 34

Iris: A Scalable Cloud File System with Efficient Integrity Checks

Iris: A Scalable Cloud File System with Efficient Integrity Checks. Cloud Storage. Dropbox. Enterprise. Amazon S3, EBS. Windows Azure Storage. Enterprise. SkyDrive. EMC Atmos. Mozy. iCloud. Google Storage. Can you trust the cloud?. User. Infrastructure bugs Malware

selena
Download Presentation

Iris: A Scalable Cloud File System with Efficient Integrity Checks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Iris: A Scalable Cloud File Systemwith Efficient Integrity Checks

  2. Cloud Storage Dropbox Enterprise Amazon S3, EBS Windows Azure Storage Enterprise SkyDrive EMC Atmos Mozy iCloud Google Storage Can you trust the cloud? User • Infrastructure bugs • Malware • Disgruntled employees User User

  3. Iris File System • Integrity verification (on the fly) • value read == value written (integrity) • value read == last value written (freshness) • data & metadata • Proof of Retrievability (PoR/PDP) • Verifies: ALL of the data is on the cloud or recoverable • More on this later • High performance (low overhead) • Hundreds of MB/s data rates • Designed for enterprises

  4. Iris Deployment Scenario heavyweight (TBs to PBsof data) cloud enterprise lightweight (1 to 5 portals) portal portal(s)(distributed) (appliances) clients

  5. Overview: File System Tree • Most file systems have file-system tree. • Contains: • Directory structure • File names • Timestamps • Permissions • Other attributes • Efficiently laid out on disk (e.g., using B-tree)

  6. Overview: Merkle Trees A • Parents contain hash of children. • To verify an element (e.g., “y”) is in the tree: nodes accessed C B E D … x y

  7. Iris: Unified File System + Merkle Tree • Binary • Balancing nodes • Directory Tree • Root node: • Directory attributes • Leafs: • Subdirectories • Files • File Version Tree • Root node: • File attributes • Leafs: • File block version numbers /u/ • File system tree is also a Merkle tree Free List /u/ b e g v/ c a f v/ b e g v/ c a f • Free List: stores deleted subtrees b e g Directory tree File version tree File blocks

  8. File Version Tree • Each file has a version tree • Version numbers increase when blocks are modified. • Version numbers propagate upwards to version tree root 0 : 7 v1 v0 0 : 3 4 : 7 v0 v1 v0 v1 v0 0 : 1 4 : 7 4 : 5 6 : 7 v1 v0 v1 v0 v1 v0 0 1 2 3 4 5 6 7 v0 v0 v0 v0 v1 v1 v0 v1 v0 v0 v1 v1 v0

  9. File Version Tree • Process repeats for every write • Unique version numbers after each write • Helps ensure freshness 0 : 7 v2 v1 0 : 3 4 : 7 v2 v1 v2 v1 v1 0 : 1 4 : 7 4 : 5 6 : 7 v1 v2 v1 v2 v0 v2 0 1 2 3 4 5 6 7 v1 v1 v1 v0 v2 v1 v2 v1 v2 v0 v2 v0

  10. Integrity Verification: MACs 4 KB 4 KB • For each file, Iris generates a MAC file. • Later used to verify integrity of data blocks. • 4 KB blocks • MAC is computed over: • file id, block index, version number, block data b1 b2 b3 b4 b5 bi … … m1 m2 m3 m4 m5 mi = MAC(fid, i, vi, bi) 20 bytes 20 bytes

  11. Merkle Tree Efficiency • Many FS operations access paths in the tree • Inefficient to access one path at a time • Paths share ancestor nodes • Accessing same nodes over and over • Unnecessary I/O • Redundant Merkle tree crypto • Latency bound • Accessing paths in parallel? • Naïve techniques can lead to corruption • Same ancestor node accessed in separate threads • Need a Merkle tree cache • Very important part of our system

  12. Merkle Tree Cache Challenges • Nodes depend on each other • Parents contain hashes of children • Cannot evict parent before child • Asynchronous • Inefficient: one thread per node/path • Avoid unnecessary hashing • Nodes near the root of the tree often reused • Efficient sequential file operations • Inefficient: access path per block  log overhead • Adjacent nodes must stay “long enough” in cache.

  13. Merkle Tree Cache Pinned Nodes are read into the tree in parallel. verifying Unpinned To Verify reading Compacting Updating Hash Ready to Write writing

  14. Reading a Path /u/ Path:“/u/v/b” v/ c a f b e g Directory tree File version tree Data file MAC File

  15. Merkle Tree Cache When both siblings arrive, they are verified. Pinned verifying Top-down verification: parent verified before children Unpinned To Verify reading Compacting Updating Hash Ready to Write writing

  16. Verification …. A …. …. B C verify …. …. D E verify

  17. Merkle Tree Cache Verified nodes enter “pinned” state. Pinned Pinned nodes cannot be evicted. verifying Pinned nodes used by async file system operations. Unpinned To Verify reading Compacting While used by at least one operation, nodes remain pinned. Updating Hash Ready to Write writing

  18. Merkle Tree Cache When node no longer used, it becomes “unpinned”. Pinned verifying Unpinned Unpinned nodes are eligible for eviction. To Verify reading Compacting When cache 75% full, eviction begins. Updating Hash Ready to Write writing

  19. Merkle Tree Cache Eviction Step #1:Adjacent nodes with identical version numbers are compacted. Pinned verifying Unpinned To Verify reading Compacting Updating Hash Ready to Write writing

  20. Compacting v2 0 : 15 • Keep: • if version ≠ parent version • for balancing • Stripped out redundant information v2 4 : 7 8 : 15 v1 8 : 9 14 : 15 v1 v1 v2 0 : 15 Often files are written sequentially and compact to a single node. v2 v2 0 : 7 8 : 15 v2 v2 0 : 3 4 : 7 8 : 11 12 : 15 v2 v1 8 : 9 10 : 11 12 : 13 14 : 15 v1 v2 v1 v2

  21. Merkle Tree Cache Pinned Eviction Step #2: Hashes are then updated in bottom-up order. verifying Unpinned To Verify reading Compacting Updating Hash Ready to Write writing

  22. Merkle Tree Cache Pinned Eviction Step #3:Nodes written to cloud storage. verifying Unpinned To Verify reading Compacting Updating Hash Ready to Write writing

  23. Merkle Tree Cache Note:Node can be pinned at any time during eviction. Pinned verifying Unpinned To Verify Path to node becomes “pinned”. reading Compacting Updating Hash Ready to Write writing

  24. Merkle Tree Cache:Crucial for Real-World Workloads • Iris benefits from locality • Very small cache required to achieve high throughput • Cache size: 5 MB to 10 MB

  25. Sequential Workloads • Results • 250 to 300 MB/s • 100+ clients • Cache • Minimal cache size ( < 1 MB ) to achieve high throughput • Reason: Nodes get compacted • Usually network bound

  26. Random Workloads • Results • Bound by disk seeks • Cache • Minimal cache size ( < 1 MB ) to achieve seek-bound throughput • Cache only used to achieve parallelism to combat latency. • Reason: Very little locality.

  27. Other Workloads • Very workload dependent • Specifically • Depends on number of seeks • Iris is designed to reduce Merkle tree seek overhead via: • Compacting • Merkle tree Cache

  28. Proofs of Retereivability • How can we be sure our data is still there? • Iris Continuously Verifies that the Cloud Possesses All Data • First sublinear solution to the open problem of Dynamic Proofs of Retreivability

  29. Proofs of Retereivability • Iris verifies that cloud possesses 99.9% of data (with high probability). • Remaining 0.1% can be recovered using Iris parity data structure. • Custom designed error-correcting code (ECC) and parity data structure. • High throughput (300-550 MB/s).

  30. ECC Challenges • Update efficiency • Want high-throughput file system • On-the-fly • ECC should not be a bottleneck • Reed–Solomon codes are too slow. • Hiding code structure • Adversary should not know which blocks to corrupt to make ECC fail. • Adversarially-secure ECC • Variable-length encoding • Handles: blocks, file attributes, Merkle tree nodes, etc

  31. Iris Error Correcting Code File system: ECC Parity Stripes: Block on file system Pseudo-random Error-Correcting Code Mapping from file system position to corresponding parities: The cloud does not know the key , so it can’t determinewhich 0.1% subset of data to corrupt to make the ECC fail. Stripe Offset Stripe Offset

  32. Iris Error Correcting Code File system: ECC Parity Stripes: Block on file system • Memory: • Update time: • Verification time: • Amortized cost Stripe Offset Stripe Offset

  33. ECC Update Efficiency • Very fast • 300-550 MB/s • Not a bottleneck in Iris

  34. Conclusion • Presented Iris file system • Integrity • Proofs of retreivability / data possession • On the fly • Very practical • Overall system throughput • 250-300 MB/s per Portal • Scales to enterprises

More Related