180 likes | 339 Views
Anti-Spam Solutions and Security. Directed by Dr. Ravi Mukkamala Presented By Ming-Chin Chen 10/19/2005. Introduction. 93% users: spam are ANNOYING! 20 billion US dollars each year in lost productivity
E N D
Anti-Spam Solutions and Security Directed by Dr. Ravi Mukkamala Presented By Ming-Chin Chen 10/19/2005
Introduction • 93% users: spam are ANNOYING! • 20 billion US dollars each year in lost productivity • Today more than 50% of all mails worldwide are spam mails. • The definition of spam: The proliferation of unsolicited commercial e-mails(UCE), including 1. Commercial advertisements. 2. Viruses 3. E-mails containing hostile program or linkage
Continue… • Security issues: 1. Identity theft: Phishing and scams are distributed as spam, directly leading to identity theft and fraud 2. Combining exploits and spam 3. Combining viruses and spam
Anti-Spam solutions • Filter: Rely on black-lists, white-lists and handcrafted rules that search for particular keywords, phrases, or suspicious pattern in the headers.
Anti-Spam Solutions • Reverse lookup: Nearly all spam uses forged sender(“From”) addresses; very few spam emails use the sender’s true email address. Furthermore, most forged email addresses appear to com from trusted domains In an effort to limit the ability to forge sender addresses, a number of proposed system have surfaced for validating a sender’s email. These systems include: Reverse Mail Exchanger(REM) Sender Permitted Form(SPF) Designated Mailers Protocol(DMP)
Anti-Spam Solutions • Challenges: Spam senders use automated bulk-mailing programs to generate millions of emails per day. Challenges attempt to impede bulk-senders by slowing the bulk-mailing process. There are two main types of challenges: challenge-response and proposed computational challenges: Challenge-Response Computational Challenge • Cryptography
Filters Filters are used by a recipient system to identify and organize spam. There are many different types of filter systems including: Word lists. Black-white lists. Hash tables. Artificial Intelligence and Probabilistic systems. Bayesian filtering technology. How does it work? Disadvantages: 1. Bypassing filters 2. False-positive 3. Filter reviewing
How does Bayesian work? • Simple filters getting less useful. Statistical analysis of spam revealed surprising ‘signatures’ in spam, e.g. ‘ff0000’(red in HTML hexadecimal color coding) • Bayesian Decision Theory: Make a decision based on previous information / ‘training’. (‘a priori’ in the world of maths) • Say we see word ‘click’, we classify email as spam if probability(spam | ‘click’) > probability(non-spam | ‘click’)
Continue… • Manually classify some spams and non-spams, to build up the database of words likely to indicate spam, or likely to indicate non-spam. • Test a new arrival email against the spam word databse, using Bayesian decision theory maths. • If the automatic classification is correct, we add this latest email to the database(stronger database). • If the automatic classification is incorrect, human needs to intervene(or database gets weaker).
Reverse lookup More complicated reverse lookup: 1. DKIM(DomainKeys Identified Mail): Derived from Yahoo DomainKeys and Cisco identified Internet Mail DKIM = Message header authentication = DNS identifiers + Public Keys in DNS 2. SenderID: Domain administrators publish Sender of Policy Framework records in the Domain Name System which identify authorized outbound email servers. Receiving email systems verify whether messages originate from properly authorized outbound email servers.
Continue… 3. FairUCE: Stands for Fair use of Unsolicited Commercial Email.Find a relationship between the envelope sender's domain and the IP address of the client delivering the mail, using a series of cached DNS look-ups. Relation not found -> Send a user-customizable challenge/response
Continue… While these solutions are viable in certain situations, they share some significant limitations: 1. Host-less and vanity domains 2. Mobile computing
Challenges 1. Challenge-Response(CR): The belief is that spam senders using fake sender email addresses will never receive the challenge, and spam senders using real email addresses will not be able to reply to all of the challenges. Limitations: a. CR deadlock b. Automated systems c. Interpretation challenges
Continue… 2. Computational Challenge: Most CC systems use complex algorithms that are intended to take time. For a single user, the time is unlikely to be noticed. But for a bulk mailer such as a spam sender, the small delays add up, making it take too long to send millions of emails. Limitations: a. Unequal taxation b. Mailing lists c. Robot armies d. Legal robot armies
Cryptography A few solutions have been proposed that use cryptography to validate the spam sender. Essentially, these systems use certificates to perform the authentication. Without a proper certificate, a forged email can be readily identified. Some proposed cryptographic solutions include: 1. AMTP 2. MTP 3. S/MIME The existing mail protocol (SMTP) has no explicit support for cryptographic authentication. Some of these proposed solutions extend SMTP (e.g., S/MIME, PGP/MIME, and AMTP), while others aim to replace the existing mail infrastructure (e.g., MTP).
Continue… Cryptography does not validate that the email address is real -- they only validate that the sender had the correct keys for the email. This creates a few issues: 1. Automated abuse 2. Usability issues
Conclusion 1. Using hybrid strategies. 2. Legislate Anti-Spam Regulations. Doubts: Viable in limited circumstances with significant limitations. Impede regular users or spammers? A good solution today might not be a good solution tomorrow.
References • Dr. Neal Krawetz, Anti-Spam Solutions and Security • Better Bayesian Filtering http://www.paulgraham.com/better.html • Anti-Phishing Working Group http://www.antiphishing.org/ • http://antispam.yahoo.com/domainkeys • http://www.microsoft.com/senderid • http://www.alphaworks.ibm.com/tech/fairuce • http://sendmail.net/dk-milter/