280 likes | 461 Views
EMFS: Email-based Personal Cloud Storage NAS 2011. Jagan Srinivasan , Wei Wei, Xiaosong Ma, Ting Yu. 1 /32. Agenda. Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion. 2 /32. Motivation.
E N D
EMFS: Email-based Personal Cloud StorageNAS 2011 Jagan Srinivasan, Wei Wei, Xiaosong Ma, Ting Yu 1/32
Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 2/32
Motivation Existing personal cloud storage services • Tie storage with internal data format and processing applications • Non-free general-purpose storage and not widely utilized Existing email services • The capacity of a single email account has increased dramatically • Provided by many reliable and reputable online service providers Leveraging existing email services • Benefit service providers as it extends their access to valuable customer data 3/32
EMFS Overview Target Workload and Assumptions • Typical personal workload • Reading, editing, and backing up documents such as Word, pdf, etc. • Targets file sizes ranging from several KBs to tens of MBs • Users will not share storage with others or allow concurrent access to his/her data. Design Goals • Usability (generic file system interface) • Scalability (extensible personal storage space) • Reliability (access despite single email failure) 4/32
EMFS System Architecture Memory Cache Email File System Interface through FUSE Email Mapping Service Local Cache Email Cloud Storage Interface striping striping … replication replication replication replication 5/32 …
Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 6/32
Data Organization and Access File Organization • Metadata • File Data stored as attachments or in the body of emails 7/32
Data Organization and Access cont’d (a) Lost metadata update (b) Lost part of data update Metadata and Data Access • Client cache management • Metadata update • Data access operations Consistency and Failure Recovery • Adopt a mechanism to ensure the atomicity of updates
Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 9/32
Email Protocol Selection Simple Mail Transfer Protocol(SMTP) • Only used for transferring emails to the server • Restriction on number of messages sent through SMTP Internet Message Access Protocol (IMAP) • Support both sending and retrieving messages • Allows users to “append” a message to their own mailbox • Not limited by traffic restrictions Post Office Protocol (POP) • Primarily used for retrieving emails • Supports simple download-and-delete access pattern
Email Protocol Selection cont’d • Email sending and appending performance • IMAP is faster than SMTP in almost all cases, by 5.5% on average and up to 42.64%
Data Placement Within Emails Multiple places used to store data in an email • Headers • Subject line • Body • Attachment In EMFS • Metadata is stored in the body section • The unique identifiers are stored in the subject line • Data can be stored either as attachments or in the body
Data Placement Within Emails cont’s Single email sending/retrieving performance • Similar performance regardless of whether the payload is placed in the body or the attachment • Attachment payload slightly outperforms the body payload with Gmail
Block Size and File Striping Organize email accounts as a RAID • Each account identified by a ”RAID Index” from 0 to n-1 • Data blocks striped across email accounts • Blocks stored on randomly chosen disks instead of having a fixed array of email disks and striping data in a round-robin manner • Metadata emails are usually small, so they are not striped EMFS uses 512KB as its default block size and 8 as the default stripe width
Block Size and File Striping cont’d Figure 5 measures a 4MB file’s read/write latency • File access latency steadily decreases when we increase the file block (attachment) size, for both Gmail and Gaweb mail
Block Size and File Striping cont’d • Figure 6 and 7 show the effect of striping with different block sizes • Striping provides a significant performance improvement • Increasing the stripe width beyond 8 or the block size beyond 1MB does not help the performance • Block sizes smaller than 256KB degrades performance in almost all cases
Data Replication Replication group • Consists of two or more disks mirroring the same data • Updates written to one of the email disks within the group • Email disks (accounts) can be added or removed from a group Replication Strategies • Read-one and Write-one • All reads and writes from EMFS go to the same email account • Read-fast and Write-fast • Reads and writes go to different accounts based on their uploading and downloading performance
Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 18/32
EMFS Evaluation System Implementation • Prototype is based on FUSE • Implemented in around 3000 lines of Python code • Two replication strategies implemented for comparison What we do • Compare EMFS with three existing distributed file systems • Use Postmark and IOZone and a synthetic file access benchmark Experiment Setup • Duo-core desktop (2.66 Ghz) with 3 GB of RAM running Ubuntu 8.10 • Both NFS and AFS servers were configured on dedicated machines inside the campus network • Jungle Disk was configured such that background or asynchronous transfers were disabled • EMFS was configured using accounts from Gmail and GawabMail
Performance Results – Postmark • Postmark measures performance for network based systems by simulating access on short lived small files • Generate different workloads (equal bias, read heavy, append heavy, and create heavy) by varying the operation bias Settings • 200 files • File size range from 4K to 16MB • 200 transactions Results • AFS and NFS perform better than EMFS and Jungle Disk • EMFS offers comparable performance to Jungle Disk • EMFS-Fast does offer better performance than EMFS-One
Performance Results – IOZone • Unlike Postmark, IOZone mainly focuses on file data access Settings • 16 MB file • Request sizes range from 128 KB to 4 MB Results • AFS and Jungle Disk achieve a transfer rate between 25 to 50 MB/s for sequential read • EMFS reports very high transfer rates • Jungle Disk reports very low throughput (about 550-600 KB/s) for random reads
Performance Results – IOZone cont’d Settings • 16 MB file • Request sizes range from 128 KB to 4 MB Results • EMFS is slightly better than Jungle Disk in terms of write throughput • NFS and AFS are faster due to their high file transfer performance and low overhead
Performance Results – Editing Workload • A synthetic benchmark that simulates a document editing task Settings • 100 files, 14 directories (with a maximum depth of 3) • File sizes range from 8KB to 4MB Results • Lookup operations for AFS is lightning fast • EMFS-Prefetchhelp reducing the total lookup time by 17.4% • All systems perform nearly the same for editing operations. • EMFS-Fast does bring an improvement of 31% for file save operation, which is quite close to Jungle Disk.
Agenda Introduction Data Organization and Access Email-based File System Design Performance Evaluation Related Work Conclusion 24/32
Related Work Email-based file systems • GmailFS [http://sr71.net/projects/gmailfs/] • YaFS[Lu, et al., IPDPS 2009] • Free email accounts for data backup [Traeger, et al., StorageSS 2006] • EMFS systematically examines email-based file system design issues Other existing client-server systems • LftpFS [http://lftpfs.sourceforge.net/] • ExpandDrive [http://en.wikipedia.org/wiki/ExpanDrive] • EMFS enables users to take advantage of widely available and increasingly powerful web-based email services Distributed file systems • NFS [Pawlowski,et al., USENIX 1994], AFS [Howard,et al., ACM Trans 1998], LBFS [Muthitacharoen, et al., SOSP 2001], GFS [Ghemawat, et al., SOSP 2003], and Ceph [Weil, et al., SODI 2006] • EMFS complements existing studies on distributed file/storage systems
Conclusion • To our best knowledge, our work is the first that systematically examines email-based file system design issues, and thoroughly • Contributions • Provides a personal cloud storage solution on top of multiple web-based free email accounts • Implements a prototype based on FUSE • Evaluates the effectiveness of features such as multi-account space aggregation, file striping, and data replication 26/32
Thank you Questions? 27/32