Porcupine: a highly scalable email service

CSE 291 Presentation on Porcupine: a highly scalable email service Authors: Y. Saito, B. N. Bershad and H. M. Levy This presentation by: Pratik Mukhopadhyay Full citation: Yasushi Saito, Brian N. Bershad, and Henry M. Levy. "Manageability, Availability and Performance in Porcupine: A Highly Scalable Cluster-based Mail Service." Proceedings of the 17th ACM Symposium on Operating Systems Principles, December 1999.

Goals of Porcupine • High Performance : • must handle billions of messages • Good Scalability : • must scale to 100’s of nodes, yet have competitive • single node performance • High Availability : • must mask node failures from users • Easy System Administration

System Architecture • Functionally homogeneous nodes • Key processes : • membership manager • mailbox manager, user profile manager • replication manager • mail delivery proxy (SMTP) • mail retrieval proxy (POP & IMAP)

Terminology • Mailbox fragment • Mailbox fragment list • User profile database • User profile soft state • User map • Cluster membership list

System Management Desired features : • Transparent handling of node addition, deletion and • temporary node failures • Load balancing across nodes automatically in face • of changing workloads

Membership services • Uses a variant of Three Round Membership Protocol • Failure detection methods: • remote operation timeout • ping neighbor in IP address order periodically • broadcast probe packets periodically Is broadcasting a good idea ? Allowing any node to be a coordinator ?

Recovery process • User map reconstruction • Soft state reconstruction • -- A 2 step process : • + Find changes • + Notify changes Do we reconfigure after every failure ? Cache soft state information ?

Scaling • Easy addition of new nodes : just install software and connect to network ( make IP address known to users ) • Performance studies show that the system uses • the newly available resources

Replication • Basic properties : • update anywhere • eventual consistency • total updates • no locking • ordered by loosely synchronized clocks Relaxed consistency for the user database ?

Load balancing • Collecting load information : • + side effect of RPC operations • + load information packets • Limit spread of a users mail for better performance

Conclusions • Performance studies show that Porcupine is • scalable, highly available and makes good use of resources • under all workloads.

Miscellaneous Functional homogeneity -- good or bad ? Will it work for stuff other than email ? For a very large email system do we want a single geographical presence ? Special support for mailing list mail ?

Porcupine: a highly scalable email service

Porcupine: a highly scalable email service

Presentation Transcript

Email Retention Policy Presentation Handouts

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

A Scalable, Content-Addressable Network

Amazon Web Services: Building Highly Scalable Web Applications Institutional Web Management Workshop July 2007

Scalable Classification

EMAIL-SIG Service Update June 15 , 2010

Kargus : A Highly-scalable Software-based Intrusion Detection System

Service Requests

Scalable Many-Core Memory Systems Optional Topic 5 : Interconnects

Badger

Scalable Video Coding with Wavelet-Based Approaches

Lessons Learned in Building a Highly Scalable MySQL Database

Unwanted Traffic: Denial of Service and Spam email

Scheduling and Resource Management for Next-generation Clusters

Towards a Scalable Database Service

Scalable Clustering for Vision using GPUs

Scala - A Scalable Language

Scalable Video Coding

Scalable Video Coding

30 List Building Tips, Tools and Ideas