210 likes | 295 Views
Porcupine: A Highly Available Cluster-based Mail Service. Y. Saito, B. Bershad, H. Levy U. Washington SOSP 1999 Presented by: Fabián E. Bustamante. Porcupine – goals & requirements. Use commodity hardware to build a large, scalable mail service Main goal – scalability in terms of
E N D
Porcupine: A Highly Available Cluster-based Mail Service Y. Saito, B. Bershad, H. Levy U. Washington SOSP 1999 Presented by: Fabián E. Bustamante
Porcupine – goals & requirements Use commodity hardware to build a large, scalable mail service Main goal – scalability in terms of • Manageability - large but easy to manage • Self-configure w/ respect to load and data distribution • Self-heal with respect to failure & recovery • Availability – survive failures gracefully • Failure may prevent some users to access email • Performance – scale linear with cluster size • Target – 100s of machines ~ billions of mail msgs/day
Functional Homogeneity “any node can perform any task” Framework Dynamic Scheduling Automatic Reconfiguration Techniques Replication Availability Goals Manageability Performance Key Techniques and Relationships
Why Email? • Mail is important • Real demand – Saito now works for Google • Mail is hard • Write intensive • Low locality • Mail is easy • Well-defined API • Large parallelism • Weak consistency
Conventional Mail Solution Static partitioning • Performance problems: • No dynamic load balancing • Manageability problems: • Manual data partition • Availability problems: • Limited fault tolerance SMTP/IMAP/POP Luca’s mbox Jeanine’s mbox Joe’s mbox Suzy’s mbox NFS servers
SMTP server POP server IMAP server Load Balancer User map Membership Manager RPC Replication Manager Mail map Mailbox storage User profile ... ... Node A Node Z Node B Porcupine Architecture
Protocol handling User lookup Load Balancing Message store Internet C A 4. “OK, luca has msgs on C and D 1. “send mail to luca” 3. “Verify luca” 6. “Store msg” ... ... A B C B 5. Pick the best nodes to store new msg C 2. Who manages luca? A Porcupine Operations DNS-RR selection
B C A C A B A C B B C C A A C C A A B B A A C C Basic Data Structures “luca” Apply hash function User map Mail map /user info Luca: {A,C} suzy: {A,C} joe: {B} ann: {B} Suzy’s MSGs Ann’s MSGs Suzy’s MSGs Luca’s MSGs Joe’s MSGs Bob’s MSGs Mailbox storage A B C
Porcupine Advantages • Advantages: • Optimal resource utilization • Automatic reconfiguration and task re-distribution upon node failure/recovery • Fine-grain load balancing • Results: • Better Availability • Better Manageability • Better Performance
Performance • Goals • Scale performance linearly with cluster size • Strategy: Avoid creating hot spots • Partition data uniformly among nodes • Fine-grain data partition
Measurement Environment • 30 node cluster of not-quite-all-identical PCs • 100Mb/s Ethernet + 1Gb/s hubs • Linux 2.2.7 • 42,000 lines of C++ code • Synthetic load • Compare to sendmail+popd
How does Performance Scale? 68m/day 25m/day
Availability • Goals: • Maintain function after failures • React quickly to changes regardless of cluster size • Graceful performance degradation / improvement • Strategy: Two complementary mechanisms • Hard state: email messages, user profile • Optimistic fine-grain replication • Soft state: user map, mail map • Reconstruction after membership change
B A A B A B A B A C A C A C A C luca: {A,C} luca: {A,C} luca: {A,C} B B B B B C C C C C A A A A A B B B B B A A A A A B B B B B A A A A A C C C C C suzy: suzy: {A,B} B A A B A B A B A C A C A C A C joe: {C} joe: {C} joe: {C} ann: ann: {B} suzy: {A,B} suzy: {A,B} suzy: {A,B} ann: {B} ann: {B} ann: {B} Soft-state Reconstruction 2. Distributed disk scan 1. Membership protocol Usermap recomputation A B C Timeline
Hard-state Replication • Goals: • Keep serving hard state after failures • Handle unusual failure modes • Strategy: Exploit Internet semantics • Optimistic, eventually consistent replication • Per-message, per-user-profile replication • Efficient during normal operation • Small window of inconsistency
Replication Efficiency 68m/day 24m/day
Replication Efficiency 68m/day 33m/day 24m/day Pretending – remove disk flushing from disk logging routines.
Load balancing: Storing messages • Goals: • Handle skewed workload well • Support hardware heterogeneity • No voodoo parameter tuning • Strategy: Spread-based load balancing • Spread: soft limit on # of nodes per mailbox • Large spread better load balance • Small spread better affinity • Load balanced within spread • Use # of pending I/O requests as the load measure
Support of Heterogeneous Clusters Relative performance improvement. +16.8m/day (+25%) Node heterogeneity – 0% all nodes ~ at same speed, 3,7 & 10% - percentage of nodes w/ very fast disks +0.5m/day (+0.8%)
Conclusions • Fast, available, and manageable clusters can be built for write-intensive service • Key ideas can be extended beyond mail • Functional homogeneity • Automatic reconfiguration • Replication • Load balancing • Ongoing work • More efficient membership protocol • Extending Porcupine beyond mail: Usenet, Calendar, etc • More generic replication mechanism