Part I The Problem

This document presents an overview of a very effective anti-spam and anti-viral Email Gateway. The gateway is intended to run in Unix-like environments, however it is geared to be especially simple to setup and maintain in Debian GNU/Linux systems. This Email Gateway is based upon an in-class activity taught by Sam Hart in his “Unix/Linux Administration” courses at the Extended University through the University of Arizona. This work is licensed under a Creative Commons License: http://creativecommons.org/licenses/by-sa/1.0/ You are free: * to copy, distribute, display, and perform the work * to make derivative works * to make commercial use of the work Under the following conditions: * Attribution. You must give the original author credit. * Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. * For any reuse or distribution, you must make clear to others the license terms of this work. * Any of these conditions can be waived if you get permission from the author.

Part I The Problem

In modern Email communications, there are really two very big problems: • SPAM : Also known as “Unsolicited Commercial Email” or “UCE”. This is any item of mail which was sent without the recipient's permission, knowledge or involvement. The traditional Postal Mail equivalent is junk mail. • It has been estimated that SPAM costs the U.S. Economy over $10 billion and worldwide over $20 billion per year in wasted bandwidth, computer space, and employee time.[1] • In recent surveys, it has been revealed that SPAM costs the average business more than $2.5 million per year.[2] • Viruses : Computer viruses are programs that perform some malicious function and are generally installed without the user's knowledge. Most modern computer viruses are spread via Email. • Recent computer viruses such as MyDoom have been so destructive that damage estimates are as high as $250 million per incident.[3] • Due to vulnerabilities in Email reading client applications (such as Microsoft's Outlook) viruses exist which can infect computers without the user having to do anything.[4]

But why do I get SPAM? Traditionally, people were sent SPAM because they were promiscuous with their email addresses. They would give out their email address haphazardly and spread it in unsafe places such as mailing lists and newsgroups. Many System Administrators who have been in the game for a while still think that this is where SPAM primarily comes from, but they would be wrong. Enter SPAM-bots.... Today, the vast majority of SPAM is sent to email addresses harvested by SPAM-bots.[5] A SPAM-bot is a program that spiders its way from web-page to web-page looking for email addresses. It collects all the email addresses it finds and uses them to send SPAM to. So by having your email address on a web-page, you are inviting a SPAM-bot to harvest your email. Techniques exist to obfuscate email addresses on websites to prevent SPAM-bots from being able to read them. However, any gain from such a technique is fleeting as the SPAM-bot makers can adjust readily to them. This is not to say the techniques are useless, just that they cannot be solely relied upon.

But everyone hates Spam, why is it still so prevalent? Spam is very easy to send. Spammers can use what are known as “Open Relays” to send their Spam anonymously. They can also use ISP dial-up and broadband accounts to send Spam practically anonymously. They can even use certain viruses to turn any Microsoft Windows machine into a spam relay point. So the entry point for a spammer is very easy and inexpensive. Furthermore, spam is one of the more profitable technology sub-industries. Like it or not, people do buy items that are sent to them as spam. Some people even enjoy getting and reading spam.[6] Spamming is illegal, can't the spammers just be sued? In some states, sending Unsolicited Commercial Email is, in fact, illegal. The problem arises in trying to first determine what is spam, and then determining who is sending the spam. Opinions differ on what is and what isn't spam. Also, since spammers can send anonymously, tracking them down can be very hard.

Part II Traditional Solutions

Relay Blocking: Get 'Em Where They Work! One of the traditional ways of fighting spam was to block those mail servers which are “open” for spammers to relay through anonymously. Mail server administrators would gather lists of known “Open Relays” and simply drop or bounce all mail coming from these relays. The problems with relay blocking Relay blocking has a number of problems. First, since most Open Relays are located in other countries (especially Asian and South American countries) by blocking them you are effectively walling yourself off from and discriminating against a significant portion of the world. If you are running a business, this can be very bad as any potential customers from these regions will not be able to contact you.[7] Additionally, there have been situations where Relay Blocking Lists are abused and contain wrong data. There have even been times where such lists go inactive and wind up blocking the entire world.[8] Combine these with appeals and corrections, or lists being used for political purposes, and you can see why they are a poor solution.[9]

Virus Protection on the Desktop Traditionally, protection against viruses was the sole responsibility of desktop users. Users were expected to keep their anti-virus software up-to-date and apply all critical and security patches to their Microsoft Windows operating system. Why this is an imperfect solution The biggest problems with relying upon the desktop user to protect themselves (and the rest of us) from viruses stem from the fact that most desktop users are not very knowledgeable with respect to computers and technology. Modern anti-virus software generally requires the user to do something to keep that software up-to-date. Norton Anti-virus ships with many computers, but comes with a limited trial offer that expires after a few months. Sophos Anti-virus generally requires the user to update their installation every month as well as install necessary critical virus identities almost daily. Additionally, Microsoft produces several new updates and patches for their operating systems every week.

Part III A Better Solution

So how can we fight Spam and Viruses more effectively? It's obvious that a very effective solution must involve some mail filtration at the mail server itself. The mail server needs to be able to take a look at each message and evaluate whether that message is an item of spam or an email virus. Building a better email gateway The first thing we must realize is that we are fighting two different entities each with their own criteria and identity. So, we will be needing two components to our email gateway: One for fighting spam and one for fighting viruses.

Component: Mail Server The first item that is needed is a working Mail Server. Truthfully, just about any mail server will work. However, because the spam and viral scanners are so resource intensive, it helps if the mail server is fast and efficient. Postfix (http://www.postfix.org/) The mail server which best fits this setup is the Postfix mail server by Wietse Venema. Postfix is an alternative to the widely used Sendmail. Postfix is chosen because of its qualities: • Postfix is fast and efficient. It scales very well to very large mail sites. • Postfix is secure “out of the box”. Unlike Sendmail, Postfix has been designed with the modern internet in mind. Postfix can run easily in a chroot environment, has built-in SMTP authentication abilities, and does not default as an Open Relay. • Postfix is easy to configure with intuitive and easy to read configuration files. • Postfix allows for content filtration plugins. • Postfix is intended as a “drop in” replacement for Sendmail. This means that the users and applications need not know it's even in use.

Component: Email Filter We next need a modular email filter which allows us to plug in anti-spam and anti-viral tools. The filter that best fits this need is Amavisd-new. Amavisd-new began its life as AMaViS, which was only an anti-viral scanner. However, it has evolved into a more general purpose email filtration system. Amavisd-new (http://www.ijs.si/software/amavisd/) Amavisd-new acts as an external scanner for Postfix. When Postfix obtains an item of mail, it hands this mail off to Amavisd-new for scanning. Amavisd-new performs whatever checks it is configured for, and determines whether the item of mail should be quarantined, bounced, dropped, or allowed to continue on to its destination. If the mail is to be passed through to the destination, Amavisd-new returns it back to Postfix, possibly with some additional email headers detailing what it did. Amavisd-new in and of itself does not do much by way of content filtration. Instead, it relies on external modules for performing the actual scanning of mail.

Component: Spam Filter The Spam Filter is not one component, but several. The overlying scanning tool is known as SpamAssassin, which uses several other components to scan each piece of mail:

Sub-Component: Vipul's Razor (http://razor.sourceforge.net/) Vipul's Razor is a distributive peer-to-peer (P2P) spam identity network. The intent of Vipul's Razor is to spread the news and description of a new item of spam as quickly as possible to all members of the P2P network. The sequence pictured below illustrates the process: A new item of spam arrives at computer “A”, which is located far away from your computer, “Z”. “A” identifies the spam, and sends the spam description to the computers near it. Those computers spread that description to the computers near them. This process repeats and eventually the description is delivered to “Z”.

Sub-Component: DCC (http://www.dcc-servers.net/dcc/) DCC (or Distributed Checksum Clearinghouse) is a system of thousands of computers and servers which collect “checksums” (a computed value which can uniquely identify something- like a digital fingerprint) of mail running through their networks. They then compare checksums with eachother. If a certain checksum comes up a large number of times, this means that it's source message is being sent through a large number of these computers to a large number of users. Thus, it is reasonable to assume it is spam (normal mail should not turn up identically thousands of times across thousands of servers). Sub-Component: Bayesian (http://www.paulgraham.com/spam.html) Bayesian Learning Spam Filters are anything which uses an adaptive algorithm to “learn” what is and what isn't spam. In theory these can be highly effective spam filters over time as they improve. In practice, however, they tend to require more effort on the part of the user than is desired. That being said, Bayesian filters can still be very effective and only add to the other components in this system, so they are included.

Sub-Component: SpamAssassin (http://www.spamassassin.org/) SpamAssassin is the glue that ties the rest of the spam filter sub-components together. Taken on their own, each of the sub-components would not be a very effective filter. However, combined together and tied in with SpamAssassin, they become lethal to spam. SpamAssassin performs a set of tests on each message it scans. Each test is assigned a numerical score[10]. When a message passes a test, the test score is added to a total for that message. After all the tests have been run, the message will have a total score that determines how likely it is to be spam. The more negative the score, the less likely it is to be spam. The more positive the score, the more likely it is to be spam. Vipul's Razor, DCC, and Bayesian Learning Filters all are assigned their own score and used to tabulate this total score. Threshold scores are defined which are used to determine if an item is spam or not. For example, one could assign the threshold score of “5.4” which would mean all messages over a score of “5.4” will be tagged as spam and dropped, quarantined, or otherwise filtered.

Component: Virus Filter Like the Spam Filter before it, the Virus Filter is actually comprised of a number of sub-systems. Here, you could realistically have any anti-viral software on the planet that works with Amavisd-new (and there are a lot of them). However, because of the features we get, we recommend the ClamAV system. Sub-Component: ClamAV (http://www.clamav.net/) Clam Anti-Virus (or ClamAV) is a anti-viral suite consisting of a number of useful mail server tools. At present, it is not very well suited for the desktop user, who should probably be using one of the popular commercial anti-virus programs. ClamAV's biggest strength is its rapidly updated and rigorously maintained viral database. There have been cases where this community driven viral database has beaten its commercial contemporaries by hours or days with respect to new viral identities. ClamAV also has a daemon called “Freshclam” which keeps the local viral identities up to date.

Part IV Conclusions and Considerations

Conclusions Spam and viral filtration is and always will be an imperfect science. In the war between mail administrators and spammers and mail administrators and virus writers (sometimes one in the same) there are always advances on both sides. Recently viruses have arrisen that do not distributed themselves as email attachments and yet exploit inherent vulnerabilities in Microsoft Outlook to download the virus from remote places on the web. Additionally, spammers have begun looking at adaptive filters like Bayesian and developing countermeasures to the algorithm. Thus, even though the email gateway detailed in this document is very effective, it should not be thought of as the complete solution to the spam and viral problems. A better solution would be to use a gateway such as this in conjunction with other deterrents. Not using insecure mail programs such as Microsoft Outlook is a step in the right direction. Obfuscating email addresses on web-readable documents is another.

Considerations The most difficult question concerning an email gateway such as this one is “What should we do with spam/viruses once we have found them?” Should the spam or virus be bounced? Dropped? Quarantined? Tagged and sent through? If it is a virus, chances are the email headers it includes have been forged in such a way as to disguise where it came from. Thus bouncing it will only send it back to some unsuspecting user who likely is not infected with the virus. Similarly for spam, since spam is often sent with bogus return addresses, bouncing it is not very effective and can only waste bandwidth. It is my recommendation that all viruses be quarantined on the server for perusal and deletion at the system administrator's leisure. I also recommend either quarantining the spam or tagging it and passing it through for the mail clients to sort. This way, messages falsely identified as spam can be quickly caught and rectified while normal spam does not burden the user.

Notes: [1] http://www.lexisone.com/balancing/articles/n080003d.html [2] http://www.bizjournals.com/charlotte/stories/2003/11/17/daily16.html [3] http://www.moneymag.com/2004/01/28/technology/mydoom_costs/ [4] http://www.fool.com/News/mft/2004/mft04031904.htm [5] http://www.cdt.org/speech/spam/030319spamreport.shtml [6] http://online.wsj.com/article_email/0,,SB107930537384354969-IhjgINplaR3n5ypaX2HcKqDm4,00.html [7] http://www.wired.com/news/politics/0,1283,50455,00.html [8] http://slashdot.org/articles/03/08/27/0214238.shtml?tid=111&tid=126 [9] http://geekcomix.com/cgi-bin/classnotes/wiki.pl?UNIX03/Realtime_Blackhole_Lists_Are_Bad [10] http://www.spamassassin.org/tests.html

Part I The Problem

Part I The Problem

Presentation Transcript

Problem Frames: part 2

the Economy: Part I

The Contest - Part I

The Neuron: Part I

I. The Problem .

The Bloodlands , part I

The 1920s – Part I

The Observer Problem Part IV Interpretations

The Enlightenment Part I

THE PROBLEM OF SIN Part 2

The Odyssey, Part I

The Farm Portfolio Problem: Part II

Problem I

Problem Solving Part I

Problem Solving and Search in AI Part I

Ofsted : Part of the Problem or Part of the Solution ?

The Contest – Part I

PART I – The RAF

The Neuron: Part I

Problem Solving Part 2

THE RENAISSANCE Part I

Part I The Heart