250 likes | 383 Views
Providing Campus Mail Services with a Linux Cluster. Giles Malet University of Waterloo gdmalet@uwaterloo.ca. Overview. What our department does Our mail problems Our proposed solutions What we have done so far Problems we’re aware of What to do next
E N D
Providing Campus Mail Services with a Linux Cluster Giles Malet University of Waterloo gdmalet@uwaterloo.ca
Overview • What our department does • Our mail problems • Our proposed solutions • What we have done so far • Problems we’re aware of • What to do next • 25 slides… but feel free to interrupt! OUCC 2004
Introductions • Information Systems & Technology (IST) • Provide services & expertise to campus • Project members • Dawn Keenan - sendmail, MRTG • Giles Malet - project lead, software • Rob Schmidt - ClamAV, J-chkmail • Jeff Voskamp - LDAP, systems stuff • plus assistance from others…. OUCC 2004
Why do this? • More and more spam and viruses • More demand on IST for solution • Want to centralize the problem (xhier) • Want something everyone can use • Old system overwhelmed • See project charter OUCC 2004
Services Desired • Robust user@uwaterloo.ca mail server • Virus scanning with immediate rejection • Refuse executables etc. by file extension • Spam identification so people can filter • Also DNS blacklists • People can opt out of (only) spam processing • LDAP needed internally – perhaps allow user lookups? OUCC 2004
What we considered • Using Solaris and RedHat Linux already….. • “Cluster” problem – needed scalability • OpenMosix • OpenKnoppix • Linux Cluster Project • Linux HA • RedHat Enterprise • …and so on. • But what does “cluster” mean? Vague. OUCC 2004
Decisions • Start simple, try more involved setup if load is too high • Keep detailed statistics so we know what’s changing (more later) • Ask for (some) input from campus • Do our own load balancing (else Cisco) • 4 cheap systems or 3 “good” systems? • Spread load, reduce impact of failure OUCC 2004
Hardware Purchased • 4 Dell servers • 1 with mirrored SCSI disks, 3.2 Ghz CPU • 3 with single IDE disk, 3.0 Ghz CPU • 1 gig memory • 1x100 + 1x1000 Mbps ethernet • Rack-mounted, serial consoles (Annex, Cyclades) OUCC 2004
Hardware Configuration OUCC 2004
Hardware – ‘head’ server • Most powerful, most robust • Runs LDAP, MySQL, web servers, incoming mail • Mirrored disk, NFS shared to slaves • all cluster data in one place (mail queues) • Only machine that is backed up • Firewall / load balancing OUCC 2004
Hardware - slaves • 3 identical machines, run all services • Only software difference is IP configuration (fix with DHCP) • Increasing the number provides more CPU, less exposure to software failure • Local disk only stores O/S • logs copied up to cerberus overnight • Firewall: only incoming connection is ssh from maintenance server • No user accounts: ssh as root from head server OUCC 2004
Software Details • Will try OpenSource first, spend money second. • Looked at AFS etc, went to NFS (simple) • Use things we know, plus some experimentation • Emphasis was to get this going quickly, will fine-tune it later. OUCC 2004
Sendmail • We know sendmail, and it works • Wanted “stock” system – no more phlookup --- thus LDAP • Something flexible: “milter” interface allows addons; can direct TCP connections from campus sendmails back to cluster OUCC 2004
Clam Anti-Virus • Open source • Auto-updating from remote server • allows submission of ‘fingerprints’ • 3 components (freshclam, milter, clamd) • Some stability problems with latter two • Too many threads: 375 * 8 megs = 3 gigs • Deeply nested messages are problematic OUCC 2004
J-Checkmail • Disallow incoming mail based on contents (regex) and extension • Also a milter interface • Lots of ongoing development (integrate virus scanning etc.) OUCC 2004
SpamAssassin • Only marks spam – up to you to filter • Configurable preferences • must be on host initiating scan, thus problems with MX’d machines • Use MySQL internally • Not foolproof: lose mail on false positive OUCC 2004
OpenLDAP • Sendmail understands LDAP • It is fast! (2 hours versus 5 mins) • Used only for mail address lookups, thus rebuild every few hours • “Hidden” users have minimal details • Starting to need LDAP for other systems, and ADS is tricky (Oracle Calendar) OUCC 2004
IPTables firewall & routing • Route incoming connections to available hosts – DNAT (load balancing) • SNAT outgoing mail connections • Firewall the rest – reduce patching • nodewatch does auto-updating • runs on head server, talks multicast • simple polling of available servers • written in-house (C program + shell scripts) OUCC 2004
Statistics • Important to know what “normal” is • Heavy use of MRTG and friends • See graphs: http://mailservices.uwaterloo.ca • Who’s using it? connections.txt OUCC 2004
Gotchas • 3 gigs is not enough virtual memory • 8 megs stack / thread • Set ulimits: memory, number of processes • Logging is main load on disk – separate from mail spools • System will get a lot of unwanted attention • Dictionary attacks on sendmail – rate limit • LDAP scans – limit to campus, limit number of results, CPU per request • Firewall heavily, and hide the slaves • How to test without losing mail? OUCC 2004
More gotchas… • What if a slave machine dies? • others can handle the load • What if the head machine dies? • Lose NFS, MySQL, LDAP • Could rebuild in a few hours from backups • Backup MX gets to do the work • Need a similar system somewhere, to share • Need better way to distribute configs • It helps if a single netmask covers all hosts • Duplicate scanning (next slide) OUCC 2004
Duplicate scanning • Machines tux and ist MX’d to cluster • mail to user@ist goes: cluster -> ist -> cluster -> tux when .forward on ist to tux. • Also, mail to user@cluster gets forwarded to destination, which also scans. • Currently it’s not worth the effort to prevent this OUCC 2004
Undeliverable postmaster mail • Sendmail aborts when postmaster mail is undeliverable, queues grow and grow • Mail containing virus from off-campus goes to user@machine-1 • tries to forward to user@machine-2 but is blocked, so • tries to bounce, but bogus From: header • tries to send to postmaster@machine-1, which is forwarded to machine-2…. OUCC 2004
Where we’re going • 3 system cluster for development • try new ideas: RH cluster, others • new sendmail, scanners etc. • Disaster recovery – head or slave dies • Centralised LDAP server, but need to deal with MS Active Directory • Document all this, and how to use it • hand it over to Production Support OUCC 2004
Winding down • Spam / virus problems are getting worse, so we’ll be busy for a while. • Contact us if you want more info, exchange ideas, give advice • Slides will be made available OUCC 2004