190 likes | 341 Views
addressing email loss with SureMail : measurement, design & evaluation. Sharad Agarwal Venkat Padmanabhan Dilip Joseph. Why study email loss?. Email does get lost sometimes SMTP is not guaranteed end-to-end reliable e.g. USENIX acceptance email Why?
E N D
addressing email loss with SureMail : measurement, design & evaluation Sharad Agarwal Venkat Padmanabhan Dilip Joseph
Why study email loss? • Email does get lost sometimes • SMTP is not guaranteed end-to-end reliable • e.g. USENIX acceptance email • Why? • spam & malware attacks, email sizes & features • large providers have complex architectures • high server load & downtime • false positives in aggressive spam & malware filters • our IT : over 90% emails thrown away • Even if infrequent, imposes high cost on users • misunderstanding • productivity loss
Outline • Motivation • Possible causes • Email loss experiment • definitions • setup • results • SureMail • design requirements • notifications • implementation • Related work
Definitions • Email loss • users A & B are “legitimate” users • A hits send on email to B • B never gets it • not in inbox; not in junk-mail folder • Silent email loss • email loss with no DSN (bounceback) • DSNs becoming less effective • spam reflection, corporate privacy, help defeat spam filter • so loss is too often silent
Measuring email loss • How often are legitimate user emails lost ? • very hard to conduct wide scale user study • Resort to artificial experiment • get many email accounts • send emails to & from accounts • send ham, not spam • check which emails didn’t arrive • junk folder ok
Email loss experiment • 46 email accounts • ~2 mailboxes in each of 22 domains • academic, commercial, corporate • Australia, Canada, New Zealand, UK, USA • Email content • Enron corpus • 1266 emails manually identified as ham • only subject & body • 16 different attachments • jpeg, gif, html, doc, ppt, pdf, zip
Email loss results • Loss by attachment • similar • except HTML ~2X loss • Loss by subject / body • 1 outlier, didn’t affect results • Loss by account, domain • similar loss seen from, to most accounts • 1 outlier, didn’t affect results
Cause of email loss • Generally, very hard to ascertain • don’t have logs from each server on each domain • Special request of 1 domain : 4 accounts • 2 normal; 2 content spam filters tag but don’t drop • overall loss rates : 3.25%, 3.02%, 3.00%, 3.15% • without content filters : 0.37%, 0.41% • Majority of loss was from content filters • but not all; rest due to blacklist filter or server issues • likely not blacklist (very easy to spot in data)
Outline • Motivation • Possible causes • Email loss experiment • definitions • setup • results • SureMail • design requirements • notifications • implementation • Related work
Desired requirements • Minimal disruption to email system • email is >95% reliable. don’t rip its guts out • Minimal user impact • email is >95% reliable. only bother user rest of the time • Preserve asynchronous communication, 4. privacy • don’t expose when receiver is online, when email is read • how many total emails sent, received, etc. • Allow third-party repudiability • signatures not required today. don’t require it for reliability • Preserve spam, malware defenses • Minimal network, compute overhead • large email domains spend $$ to handle BW & load
You’ve lost mail! SureMail • Imagine a notification channel • very reliable, but only for notifications, not emails • Each time email is sent • also send notification : 20 byte hash(email) • if receiver gets notification but not email loss Receiver R Sender S notification channel
Notification channel : out of band • Separate service to exchange messages • central server, DHT, Amazon S3, … • “NX” servers • more reliable : no spam filter, no malware filter, small • fast : simple put/get service • requires deployment • Requirements • min disruption • min user impact • async. communication • privacy fix • repudiability • spam/malware defense • min overhead • DoS resilient fix
Notification channel : in band • Use email header • embed hashes of last 3 emails • think of TCP sequence number • unreliable : only works with intermittent loss • slow : depends on email frequency • easy : no additional infrastructure needed • Requirements • min disruption • min user impact • async. communication • privacy • repudiability • spam/malware defense • min overhead
Reply based shared secret • But … annoyance attack • Mallory sends email to Bob • but spoofed as though from Alice; with 3 bogus notifications • Bob thinks he has lost 3 emails from Alice • Reply based shared secret • observation : users don’t converse with spammers • else not a spammer ? • solution : setup shared key • AE1B ; B replies BE2A ; A replies AE3B • use message IDs of E1, E2 to do Diffie-Hellman exchange • sign notifications • assumption : no eavesdroppers • else, bigger privacy problems
Implementation • Out-of-band notification channel • front end shim : C++ • storage backend : Amazon SQS • would cost ~ 0.08cents / user / month (1000 notifications / day) • email loss experiment : 99.9976% notification reliability • In band notification channel • Outlook 2007 plug-in • composes & processes notifications; maintains state; alerts loss • in progress • reply based shared secret • handling multiple clients (e.g. work, home, laptop) • handling OMA / OWA changes
Details in paper • Is email delayed or lost ? then what? • warn user after 2 hours • allow user to decide • Corner cases • First time sender • who is a spammer vs. a legitimate first time sender? • One way communication (e.g. bank statements) • Mailing lists
Related work • Email loss • [AB2005] : 0.1% - 5% loss • but measured bounceback loss • [Lang2004] : 0.69% - 4% loss • but only 1 sender, unusual emails • MDN (read receipts) • user privacy problem • DSN (bouncebacks) • spam reflection, corp. privacy, helps spammers • Better spam filters & whitelists • definitely useful against loss due to spam filter • SureMail works regardless of cause of loss • some loss not due to spam filters
Summary • Extensive email loss study • 0.7% - 1% silent loss • SureMail notifications • small, fixed format, no human readable content • no spam filter, malware filter needed • can be handled far more reliably than email • reply-based shared secret to block spoofing • In-band & OOB channels • Implementation release expected in Fall 2007
Questions? • Thanks for university accounts! • H. Balakrishnan (MIT), P. Barford (Wisconsin), C. Dovrolis (GaTech), R. Govindan (USC), S. Keshav (Waterloo), L. Qiu (UT Austin), J. Rexford (Princeton), H. Schulzrinne (Columbia), D. Veitch (Melbourne), L. Zhang (UCLA)