320 likes | 688 Views
SureMail Notification Overlay for Email Reliability Sharad Agarwal Venkat Padmanabhan Dilip A. Joseph 8 March 2006 Outline Email loss problem Design philosophy SureMail design SureMail robustness to security attacks SureMail implementation What is Email Loss?
E N D
SureMailNotification Overlay for Email Reliability Sharad Agarwal Venkat Padmanabhan Dilip A. Joseph 8 March 2006
Outline • Email loss problem • Design philosophy • SureMail design • SureMail robustness to security attacks • SureMail implementation
What is Email Loss? • Email loss : sent email not received • Silent email loss • Loss w/o notification (no bounceback / DSN) • Why? • Aggressive spam filters • 90% corp. emails thrown away (blacklist) • AOL’s strict whitelist rules (must send 100/day) • Bouncebacks contribute to spam • Complex mail architecture upgrades / failures • SMTP reliability is per hop, not end-to-end
How Much Email Loss? • Even loss of 1 email / user / year is bad • If it’s an important email • To really measure loss • Monitor many users’ send & receive habits • Count how many sent emails not received • Count how many bouncebacks received • Difficult to find enough willing participants that email each other across multiple domains
Prior Work • “The State of the Email Address” • Afergan & Beverly, ACM CCR 01.2005 • Rely on bouncebacks; similar to “dictionary” attack • 25% of tested domains send bouncebacks • 1 sender • 0.1% to 5% loss, across 1468 servers, 571 domains • “Email dependability” • Lang, UNSW B.E. thesis 11.2004 • 40 accounts, 16 domains receive emails from 1 sender • Empty body, sequence number as subject • 0.69% silent loss
Our Email Loss Study • Methodology • Controller composes email, sends • Our code for SMTP sending • Outlook for receiving (both inbox & junk mail) • Parse sent and received emails into SQL DB • Match on {sender,receiver,subject,attachment} • Heuristics for parsing bouncebacks • Want • Many sending, receiving accounts • Real email content
Experiment Details • Email accounts • 36 send, 42 receive • Junk filters off if possible • Email subject & body • Enron corpus subset • 1266 emails w/o spam • Email attachment • 70% no attachment • jpg,gif,ppt,doc,pdf,zip,htm • marketing,technical,funny
Loss Rates by Account • Loss rate 1.82% to 0.82%
Loss Rates by Attachment • Nothing stands out
Loss Rates by Subject/Body • ~50-250 emails sent per subject • Without 35% case : loss rate 1.82% to 1.79%
Summary of Findings • Email loss rates are high • 1.82% loss • 0.71% conservative silent loss ( 1 / 140 ) • Difficult to disambiguate cause of loss • Difference between domains (filters or servers?) • No difference between mailboxes • No difference between attachments • Only 1 body had abnormally high loss
Outline • Email loss problem • Design philosophy • SureMail design • SureMail robustness to security attacks • SureMail implementation
We Found Email Loss; Now What? • Can try to fix email architecture, but • Hard to know exactly what is problem • Spam filters continually evolve; not perfect • Some architectures are very complicated • How many email systems are out there? • The current system mostly works
Fixing the Architecture • Improve email delivery infrastructure • more reliable servers • e.g., cluster-based (Porcupine [Saito ’00]) • server-less systems • e.g., DHT-based (POST [Mislove ’03]) • total switchover might be risky • “Smarter” spam filtering • moving target mistakes inevitable • non-content-based filtering still needed to cope with spam load
Email Notifications • DSN / bouncebacks • Most spam filters don’t generate DSN on drop • Bogus DSNs due to spam w/ bogus sender • Some MTAs block DSN for privacy • MTA crash may not generate DSN • No DSN for loss between MTA and MUA • MDN / read receipts • Expose private info (when read, when online) • Can help spammers
Notification Design Requirements • Cause minimal MTA/MUA disruption • Cause minimal user disruption • Preserve asynchronous operation • Preserve user privacy • Preserve repudiability • Maintain spam and virus defenses • Minimize traffic overhead
Outline • Email loss problem • Design philosophy • SureMail design • SureMail robustness to security attacks • SureMail implementation
SureMail Design Requirements • Cause minimal MTA/MUA disruption • No MTA modification; no Outlook modification • Cause minimal user disruption • User notified only on loss • Preserve asynchronous operation • Preserve user privacy • Only receiver is notified of loss • Preserve repudiability • No PKI / authentication • Maintain spam and virus defenses • Emails not modified • Minimize traffic overhead • 85 byte notification per email
Basic Operation • Sender S sends email to receiver R • S also posts notification to overlay • R periodically downloads new email • R also downloads notifications from overlay • Notification without matching email loss • delay : median 26s, mean 276s, max 36.6 hrs
You’ve Lost Mail! H1(Mnew), H1(Mold), T, MAC([T,H1(Mnew)] ,H2(Mold)) GetNotifications Request lost message Register Verify SureMail Overview Recipient R Sender S Dnot=H1(R) Dreg=H2(R)
SureMail Overview • Emails, MTAs, MUAs unmodified • Parallel notification overlay system • Decentralized; limited collusion • Agnostic to actual implementation • end-host-based (e.g., always-on user desktops) • infrastructure-based (e.g., “NX servers”) • Prevent notification snooping & spam • Email based registration • Reply based shared secret
Email-Based Registration • Goal: prevent hijacking of R’s notifications • Only R can receive emails sent to R • Limited collusion among notification nodes • One-time operation for initial registration • R sends registration request to H2(R), H3(R) • H2(R), H3(R) email registration secrets to R • To retrieve notifications at H1(R) • R uses registration secrets with H1(R); H1(R) verifies with H2(R) H3(R), sends back notifications • Neither H1(R), H2(R), H3(R) can associate notifications with R, unless they collude
Reply-Based Shared Secret • Goal: prevent notification spoofing & spam • Only R & S know their email conversations • S rarely converses with spammers • Reply detection • S sends Mold to R, R replies with M’old • S uses H(Mold) to “prove” identity to R in future • Notification for Mnew from S to R • H1(Mnew),H1(Mold),T,MAC([T,H1(Mnew)],H2(Mold)) • Only R can identify S • Shared secret can be continually refreshed
Attacks Defeated by Design • X cannot retrieve H1(R) notifications • H1(R) cannot identify R • H2(R), H3(R) cannot see R’s notifications • If they don’t collude; can increase to 3 nodes • X, H(R) cannot identify S • X, H(R) cannot learn Mnew, Mold • X cannot annoy R with bogus notifications • X cannot masquerade post to H1(R) as S
First Time Sender • What if FTS email is lost? • FTS & spammer generally indistinguishable • But perhaps FTS knows I who knows R • Email networks have small world properties • I makes shared secret SI with all known parties • FTS sends email to R • Posts multiple notifications • One for every SI it has learned
Other Issues • Reply-detection: • “in-reply-to” header may not always help • indirect checks based on text similarity • Reducing overhead: • post notifications only for “important” emails • delay posting in hope of receiving implicit ACK (reply) or NACK (bounce-back) • Mobility: • reply-based shared secret can be regenerated • web-mail • Can support mailing lists
Outline • Email loss problem • Design philosophy • SureMail design • SureMail robustness to security attacks • SureMail implementation
SureMail Implementation • Reply detection heuristic for shared secret • Notification service • Centralized server running • Chord based DHT running • Notification posting, retrieving • Grab in/out bound email via Outlook MAPI call • No modification to Outlook binaries • XML notification put/get commands • Simple Win32 GUI
Lost! Not lost SureMail GUI • Client UI will see emails, will post & retrieve notifications • E.g. running on two machines netprofa@microsoft.com and netprofa@gmail.com
Summary • Email does get lost! • ~40 accounts, 158000 emails, 0.71%-0.91% silent loss • SureMail • Client based – unmodified email, servers, clients; no PKI • User intervention only on lost email • Keeps repudiability, privacy, asynchronous, spam & virus defense • Separate notification overlay robust • Simple, small message format • No virus, malware, spam filters needed • Provides failure independence • Status • ACM Hotnets 05; ACM Sigcomm 06 submission • Prototype implementation