120 likes | 254 Views
CSE 291 Presentation on. Porcupine: a highly scalable email service. Authors: Y. Saito, B. N. Bershad and H. M. Levy This presentation by: Pratik Mukhopadhyay. Full citation:
E N D
CSE 291 Presentation on Porcupine: a highly scalable email service Authors: Y. Saito, B. N. Bershad and H. M. Levy This presentation by: Pratik Mukhopadhyay Full citation: Yasushi Saito, Brian N. Bershad, and Henry M. Levy. "Manageability, Availability and Performance in Porcupine: A Highly Scalable Cluster-based Mail Service." Proceedings of the 17th ACM Symposium on Operating Systems Principles, December 1999.
Goals of Porcupine • High Performance : • must handle billions of messages • Good Scalability : • must scale to 100’s of nodes, yet have competitive • single node performance • High Availability : • must mask node failures from users • Easy System Administration
System Architecture • Functionally homogeneous nodes • Key processes : • membership manager • mailbox manager, user profile manager • replication manager • mail delivery proxy (SMTP) • mail retrieval proxy (POP & IMAP)
Terminology • Mailbox fragment • Mailbox fragment list • User profile database • User profile soft state • User map • Cluster membership list
System Management Desired features : • Transparent handling of node addition, deletion and • temporary node failures • Load balancing across nodes automatically in face • of changing workloads
Membership services • Uses a variant of Three Round Membership Protocol • Failure detection methods: • remote operation timeout • ping neighbor in IP address order periodically • broadcast probe packets periodically Is broadcasting a good idea ? Allowing any node to be a coordinator ?
Recovery process • User map reconstruction • Soft state reconstruction • -- A 2 step process : • + Find changes • + Notify changes Do we reconfigure after every failure ? Cache soft state information ?
Scaling • Easy addition of new nodes : just install software and connect to network ( make IP address known to users ) • Performance studies show that the system uses • the newly available resources
Replication • Basic properties : • update anywhere • eventual consistency • total updates • no locking • ordered by loosely synchronized clocks Relaxed consistency for the user database ?
Load balancing • Collecting load information : • + side effect of RPC operations • + load information packets • Limit spread of a users mail for better performance
Conclusions • Performance studies show that Porcupine is • scalable, highly available and makes good use of resources • under all workloads.
Miscellaneous Functional homogeneity -- good or bad ? Will it work for stuff other than email ? For a very large email system do we want a single geographical presence ? Special support for mailing list mail ?