1 / 43

Andrew System E-mail Architecture at Carnegie Mellon University

Andrew System E-mail Architecture at Carnegie Mellon University. Rob Siemborski Walter Wong rjs3@andrew.cmu.edu wcw@cmu.edu. Computing Services Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213. Last Revision: 01/27/2004 (wcw). Presentation Overview. History & Goals

Jimmy
Download Presentation

Andrew System E-mail Architecture at Carnegie Mellon University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Andrew SystemE-mail Architecture atCarnegie Mellon University Rob Siemborski Walter Wong rjs3@andrew.cmu.edu wcw@cmu.edu Computing Services Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213 Last Revision: 01/27/2004 (wcw)

  2. Presentation Overview • History & Goals • The Big Picture • Mail Transfer Agents • Mail Processing (Spam & Virus Detection) • The Directory • The Cyrus IMAP Aggregator • Clients and Andrew Webmail • Current Andrew Hardware Configuration • Future Directions

  3. The Early Years • Early 80s – The Andrew Project • Campus-wide computing • Joint IBM/CMU Venture • One of the first large scale distributed systems, challenging the ‘mainframe’ mentality • The Andrew File System (AFS) • The Andrew Message System (AMS)

  4. Goals of theAndrew Message System • Reliability • Machine and Location Independence • Integrated Message Database • Personal Mail and Bulletin Boards • Separation of Interface from Functionality • Support for Multi-Media • Scalability • Easy to Extend, Easy to Use

  5. End of AMS • AMS was a nonstandard system • Avoid becoming a “technology island” • Desire to not maintain our own clients. • AMS was showing scalability problems • Desire to decouple the file system from the mail system

  6. Project Cyrus Goals • Scalable to tens of thousands of users • Support wide use of bulletin boards • Use widely accepted standards-based technologies • Comprehensive client support on all major platforms • Supports a disconnected mode of operation for the mobile user

  7. Project Cyrus Goals (2) • Supports Kerberos authentication • Allows for easy sharing of private folders with select individuals • Separation of the mail store from a distributed file system • Can be independently installed, managed and set up for use in small departmental computing facilities

  8. More CMU Mail System Goals • Allow users to have a single @cmu.edu address no matter where their actual mail store is located • “CMUName” Service • Ability to detect and act on incoming Spam and Virus Messages • Provide access mail over the Web • Integration of messaging into the overall Computing Experience

  9. The Big Picture The Internet Users / Mail Clients LDAP Directory Servers Cyrus IMAP Aggregator Mail Transfer Agents (Three Pools)

  10. Mail Transfer Agents The Internet Users / Mail Clients LDAP Directory Servers Cyrus IMAP Aggregator Mail Transfer Agents (Three Pools)

  11. Mail Transfer Agents • Andrew has 3 Pools of Mail Transfer Agent (MTA) Machines • Mail exchangers (MX Servers) receive and handle mail from the outside world for the ANDREW.CMU.EDU domain. • The “SMTP Servers” process user submitted messages (SMTP.ANDREW.CMU.EDU) • Mail exchangers for the CMU.EDU domain (the CMU.EDU MXs) • All Andrew MTAs run Sendmail

  12. Mail Transfer Agents (2) • Why 3 Pools? • MX Servers • Subject to the ebb and flow of the outside world • Significant CPU-intensive processing • Typically handle much larger queues (7,000+ messages each) • SMTP Servers • Speak directly to our clients • Need to be very responsive • Very small queues (200 messages each)

  13. Mail Transfer Agents (3) • CMU.EDU MXs • Service separation from Andrew MX servers • Mostly just forwarding • No real need to duplicate processing done on Andrew MX servers • All Three Pools are Redundant • Minimize impact of a machine failure

  14. Mail Transfer Agents (4) • Separate MTA pools give significant control over incoming email. • A message may touch multiple pools • Example: Message processed by CMU.EDU MX, bound for foo@ANDREW.CMU.EDU User submits message to foo@CMU.EDU via SMTP servers Message processed by ANDREW MX Final Delivery To Cyrus Aggregator

  15. Mail Processing • All mail through the system is “processed” to some degree. • Audit Logging • Cleaning badly-formed messages • Blocking restricted sender/recipients/relays • More substantial processing done by Andrew MX Servers

  16. Mail Processing (2) • Spam Detection • Uses Heuristic Algorithms to identify Spam Messages (SpamAssassin) • Tags message with a header and score • User initiated filters (SIEVE) can detect the header and act upon it (bounce the message or file it into an alternate folder) • Very computationally expensive on MX

  17. Mail Processing (3) • Virus Detection • Uses signatures to match virus messages (ClamAV) • “Bounce” message immediately at the incoming RCPT • Debate between bounce vs. tag

  18. The Directory The Internet Users / Mail Clients LDAP Directory Servers Cyrus IMAP Aggregator Mail Transfer Agents (Three Pools)

  19. The Directory • Mail delivery and routing is assisted by an LDAP-accessible database. • Every valid destination address has an LDAP entity • LDAP lookups can do “fuzzy matching” • LDAP queries done against replicated pool

  20. The Directory (2) • Every account has a mailRoutingAddress: the “next hop” of the delivery process • mRA is not generally user configurable • Some accounts have a user-configurable mailForwardingAddress (mFA) • mFA will override the mRA

  21. The Cyrus IMAP Aggregator The Internet Users / Mail Clients LDAP Directory Servers Cyrus IMAP Aggregator Mail Transfer Agents (Three Pools)

  22. The IMAP Protocol • Standard Protocol developed by the IETF • Messages Remain on Server • MIME (Multipurpose Internet Mail Extentions) Aware • Support for Disconnected Operation • AMS-Like Features (ACLs, Quota, etc)

  23. The Cyrus IMAP Server • CMU Developed IMAP/POP Server • Released to public and maintained as active Open Source project under BSD-like License • No servers were available implemented all of the features needed to replace AMS. • Designed to be a “Black Box” server • Performance and Scalability were key to Design

  24. Initial Cyrus IMAP Deployment • Single monolithic server (1994-2002) • Originally deployed alongside AMS • Features were implemented incrementally • Users were transitioned incrementally • Local users provided a good testing pool • Scaled surprisingly well

  25. Cyrus IMAP Aggregator Design • IMAP not well suited to clustering • No real concept of mailbox “location” • Clients expect consistent views of the server and its mailboxes • Significantly varying client implementation quality • Aggregator was designed to make many machines look like one so any user can share a folder to any other user

  26. Cyrus IMAP Aggregator Design (2) Users / Mail Clients • Three Participating Types of Servers • IMAP Frontends (“dataless” Proxies) • IMAP Backends (“Normal” IMAP Servers; your data here) • MUPDATE (Mailbox Database) Frontends Proxy Requests For Clients Backends hold Traditional Mailbox Data MUPDATE Server Maintains list

  27. IMAP Frontends Users / Mail Clients • Fully redundant • All are identical • Maintain local replica of mailbox list • Proxies most requests, querying backends as needed • May also send IMAP referrals to capable clients Frontends Proxy Requests For Clients Backends hold Traditional Mailbox Data MUPDATE Server Propogates mailbox list changes to frontends

  28. IMAP Backends Users / Mail Clients • Basically Normal IMAP Servers • Mailbox Operations are approved & recorded by MUPDATE server • Create / Delete • Rename • ACL Changes Requests are proxied by Frontends Backends hold Traditional Mailbox Data MUPDATE Server approves mailbox operations

  29. MUPDATE Server Users / Mail Clients • Specialized Location Server (similar to VLDB in AFS) • Provides guarantees about replica consistency • Simpler than maintaining database consistency between all the frontends Frontends update local mailbox list replicas Backends send mailbox list updates MUPDATE Server approves and replicates updates

  30. Cyrus Aggregator:Data Usage • User INBOXes and sub folders • Users can share their folders • Internet mailing lists as public folders • Netnews Newsgroups as public folders • Public folders for “workflow”; general discussion, etc • Continued “bboard” paradigm: 30,000+ folders visible

  31. Cyrus IMAP Aggregator:Advantages • Horizontal Scalability • Adding new capacity to frontend and/or backend is easy to do and can be done with no user visible downtime • Management possible through single IMAP client session • Wide client interoperability • Simple Client configuration • Ability to (mostly) transparently move users from one backend to another • Failures are partitioned

  32. Cyrus IMAP Aggregator:Limitations • Backends are NOT redundant • MUPDATE is a single point of failure • Failure only results in error when trying to CREATE/DELETE/RENAME or change ACLs on mailboxes

  33. Cyrus IMAP Aggregator:Backups • Disk partition backup via Kerberized Amanda (http://www.amanda.org) • Restores are manual • 21 day rotation – no archival • Backup to disk (no tapes)

  34. Cyrus IMAP Aggregator:Other Protocol Support • POP3 support for completeness • Possibly creates more problems than not (where did my INBOX go?) • NNTP to populate bboards • NNTP access to mail store • LMTP w/AUTH for mail transport from MTA to backends

  35. Clients The Internet Users / Mail Clients LDAP Directory Servers Cyrus IMAP Aggregator Mail Transfer Agents (Three Pools)

  36. Clients • IMAP has many publicly available clients • Varying quality • Varying feature sets • Central computing recommends Mulberry • Roaming Profiles via IMSP • Many IMAP extensions supported (e.g. ACL) • UI not as popular

  37. Clients - Webmail • Use SquirrelMail as a Webmail Client • Local Modifications • Interaction with WebISO (pubcookie) Authentication • Kerberos Authentication to Cyrus • Local proxy (using imtest) to reduce connection load on server • Preferences and session information shared via AFS (simple, non-ideal)

  38. Clients – Mailing Lists • +dist+ for “personal” mailing lists+dist+~user/foo.dl@andrew.cmu.edu • Majordomo for “Internet-style” mailing lists • Prototype web interface for accessing bboards • Authenticated (for protected bboards)http://bboard.andrew.cmu.edu/bb/org.acs.asg.coverage • Unauthenticated (for mailing list archives)http://asg.web.cmu.edu/bb/archive.info-cyrus

  39. Andrew Mail Statistics • Approximately 30,000 Users • 12,000+ Peak Concurrent IMAP Sessions • 8+ IMAP Connections / Second • 650 Peak Concurrent Webmail Sessions • Approximately 1.5 Million Emails/week • See Also: http://graphs.andrew.cmu.edu

  40. Andrew Hardware • 5 frontends • 3 Sun Ultra 80s (2x450mhz UltraSparc II; 2 GB memory; Internal 10000 RPM disk) • 2 SunFire 280Rs (2x1ghz UltraSparc III; 4 GB memory; Internal 10000 RPM disk) • 5 backends • 4 Sun 220R (450mhz UltraSparc II; 2GB memory; JetStor II-LVD RAID5 8x36 GB 15000 RPM disks) • 1 SunFire 280R (2x1ghz UltraSparc III; 4GB memory; JetStor III U160 RAID5 8x73 GB 15000 RPM disks) • 1 mupdate • Dell 2450 (Pentium III 733 MHz; 1 GB memory; PERC3 RAID5 4x36GB 10000RPM disks) • 3 ANDREW.CMU.EDU MX • Dell 2650 (Pentium 4 3ghz; 2 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks) • 3 SMTP.ANDREW.CMU.EDU • Dell 2650 (Pentium 4 3ghz; 2 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks) • 2 CMU.EDU MX • Dell 2650 (Pentium 4 3ghz; 2 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks) • 1 mailing list • Dell 2650 (Pentium 4 2.8ghz; 1 GB memory; PERC3 RAID1 2x73GB 15,000rpm disks) • 3 webmail • Dell Optiplex GX 260 small form factor (Pentium 4 2.4Ghz; 1GB memory; 80GB ATA disk)

  41. Current Issues • Lack of client support for ‘check new’ for IMAP folders (even when client supports NNTP) • Large number of visible folders can be problematic for clients (i.e. PocketInbox)

  42. Potential Future Work • Online/Self-Service Restores (e.g. AFS “OldFiles”, delayed EXPUNGE) • Virtual “Search” Folders • Fault tolerance • Replicate backends • Support multiple MUPDATE servers • Multi-Access Messaging Hub • One Mail Store, many APIs • IMAP, POP, NNTP, HTTP/DAV/RSS, XML/SOAP • Web Bulletin Boards / blog interface • Remove Shared Folder / Mailing List Distinction

  43. Current Software • MTA: Sendmail 8.12.10 • LDAP: OpenLDAP 2.0 • Cyrus: 2.2.3 • MIMEDefang: 2.28 • SpamAssassin: 2.61 • ClamAV: 0.63 • Squirrelmail: 1.4.2 (w/Local Modifications)

More Related