270 likes | 410 Views
Porcupine: A Highly Scalable, Cluster-based Mail Service. Yasushi Saito Brian Bershad Hank Levy. Discussion Led by Jeremy Shaffer Presentation modified from Porcupine paper and slides for SOSP 99. Do we agree these are good areas to look at?. Goals.
E N D
Porcupine: A Highly Scalable, Cluster-based Mail Service Yasushi Saito Brian Bershad Hank Levy Discussion Led by Jeremy Shaffer Presentation modified from Porcupine paper and slides for SOSP 99
Do we agree these are good areas to look at? Goals Use commodity hardware to build a large, scalable mail service (1 SA, 100 Million Users, 1 Billion Messages/Day) Three facets of scalability ... • Performance: Linear increase with cluster size • Manageability: React to changes automatically - self- heal/self-configure • Availability: Survive failures gracefully
Is email really a good service to test or just the best for this technique? Why Email? Mail is important Real demand Cluster research has focused on web services Mail is an example of a write-intensive application • disk-bound workload • reliability requirements • failure recovery Mail servers have relied on “brute force” approach to scaling
The Porcupine Difference • Cluster based solution to email • Functional Homogeneity – any node can perform any service - Key to manageability - No centralized control point • For improved performance must find harmony in two main • concepts -- load balancing and affinity Traditional Mailserver: Performance problems: No dynamic load balancing Manageability problems: Manual data partition decision – personal intensive Availability problems: Limited fault tolerance
Key Techniques and Relationships Functional Homogeneity “any node can perform any task” Framework Automatic Reconfiguration Load Balancing Techniques Replication Goals Availability Manageability Performance
Basic Data Structures • Mailbox Fragment: The collection • of messages stored for a user at any node. • Multiple Mailbox frags per user. • Mailbox Frag. List: Nodes that • contain fragments for a given user. • User Profile Dbase: Usernames, passwords, etc. • User Profile Soft State: One node keeps info for a given user • User Map: Maps hash value of each user to a node where prof. state is • Cluster Membership List: Each nodes view of the set of other nodes
SMTP server POP server IMAP server Load Balancer User map Membership Manager RPC Porcupine Architecture Replication Manager Mail map Mailbox storage User profile ... ... Node A Node B Node Z
Internet Porcupine Operations Protocol handling User lookup Load Balancing Message store C A DNS-RR selection 1. “send mail to bob” 4. “OK, bob has msgs on C and D 3. “Verify bob” 6. “Store msg” ... ... A B C B 5. Pick the best nodes to store new msg C 2. Who manages bob? A
Measurement Environment 30 node cluster of not-quite-all-identical PCs 100Mb/s Ethernet + 1Gb/s hubs Linux 2.2.7 42,000 lines of C++ code Synthetic load Compare to sendmail+popd
How does Performance Scale? 68m/day 25m/day
Replication is Expensive Porcupine replication is very resource intensive. Is it worth it? Would hardware fail-over be better? Mirrored disk drives? …
Load balancing: Deciding where to store messages Goals: Handle skewed workload well Support hardware heterogeneity No magic parameter tuning Strategy: Spread-based load balancing Spread: soft limit on # of nodes per mailbox Large spread better load balance Small spread better affinity Load balanced within spread Use # of pending I/O requests as the load measure
Load Balancing Non-Replicated Throughput on a 30-node System
How Well does Porcupine Support Heterogeneous Clusters? Performance improvement on a 30-node Porcupine cluster without replication when disks are added to small number of nodes.
Load Balancing Issues with Porcupine Load Balancing? Security? Client Dependence? … If have homogenous cluster why not just use Random? If heterogenous isn’t spread a “magic parameter” that needs tuning?
Availability Goals: Maintain function after failures React quickly to changes regardless of cluster size Graceful performance degradation / improvement Strategy: Two complementary mechanisms Hard state: email messages, user profile Optimistic fine-grain replication Soft state: user map, mail map Reconstruction after membership change
Availability Issues with Porcupine Availability… Node failure can reduce performance of entire cluster. Do you really want users to have inconsistent information? Would more hardware based solution be easier?
Conclusions Fast, available, and manageable clusters can be built for write-intensive service Key ideas may be extended beyond mail Functional homogeneity Automatic reconfiguration Replication Load balancing
Discussion • Note to self … get in on VC Funding for Porcupine • Offer them all jobs with nice stock options • Give them a small grant to see how it turn it out • Next paper please… • Send the paper to your competitors to mislead them What do you do…..
Reference Slides Following slides are available for reference and to be available to help in answering any questions that arise.
Load Balancing Replicated throughput on a 30-node system
How Well does Porcupine Support Heterogeneous Clusters? Performance improvement on a 30-node Porcupine cluster with replication when disks are added to small number of nodes.
A look at Replication Effects Throughput of the system configured with infinitely fast disks
A look at Replication Effects Summary of single-node throughput in a variety of configurations
B B B B B C C C C C A A A A A B B B B B A A A A A B B B B B A A A A A C C C C C Soft-state Reconstruction 2. Distributed disk scan 1. Membership protocol Usermap recomputation B A A B A B A B A C A C A C A C A bob: {A,C} bob: {A,C} bob: {A,C} suzy: suzy: {A,B} B A A B A B A B A C A C A C A C B joe: {C} joe: {C} joe: {C} ann: ann: {B} suzy: {A,B} C suzy: {A,B} suzy: {A,B} ann: {B} ann: {B} ann: {B} Timeline
Hard-state Replication Goals: Keep serving hard state after failures Handle unusual failure modes Strategy: Exploit Internet semantics Optimistic, eventually consistent replication Per-message, per-user-profile replication Efficient during normal operation Small window of inconsistency