1 / 19

Sharad Agarwal Venkat Padmanabhan Dilip Joseph

addressing email loss with SureMail : measurement, design & evaluation. Sharad Agarwal Venkat Padmanabhan Dilip Joseph. Why study email loss?. Email does get lost sometimes SMTP is not guaranteed end-to-end reliable e.g. USENIX acceptance email Why?

crescent
Download Presentation

Sharad Agarwal Venkat Padmanabhan Dilip Joseph

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. addressing email loss with SureMail : measurement, design & evaluation Sharad Agarwal Venkat Padmanabhan Dilip Joseph

  2. Why study email loss? • Email does get lost sometimes • SMTP is not guaranteed end-to-end reliable • e.g. USENIX acceptance email • Why? • spam & malware attacks, email sizes & features • large providers have complex architectures • high server load & downtime • false positives in aggressive spam & malware filters • our IT : over 90% emails thrown away • Even if infrequent, imposes high cost on users • misunderstanding • productivity loss

  3. Outline • Motivation • Possible causes • Email loss experiment • definitions • setup • results • SureMail • design requirements • notifications • implementation • Related work

  4. Definitions • Email loss • users A & B are “legitimate” users • A hits send on email to B • B never gets it • not in inbox; not in junk-mail folder • Silent email loss • email loss with no DSN (bounceback) • DSNs becoming less effective • spam reflection, corporate privacy, help defeat spam filter • so loss is too often silent

  5. Measuring email loss • How often are legitimate user emails lost ? • very hard to conduct wide scale user study • Resort to artificial experiment • get many email accounts • send emails to & from accounts • send ham, not spam • check which emails didn’t arrive • junk folder ok

  6. Email loss experiment • 46 email accounts • ~2 mailboxes in each of 22 domains • academic, commercial, corporate • Australia, Canada, New Zealand, UK, USA • Email content • Enron corpus • 1266 emails manually identified as ham • only subject & body • 16 different attachments • jpeg, gif, html, doc, ppt, pdf, zip

  7. Email loss results • Loss by attachment • similar • except HTML ~2X loss • Loss by subject / body • 1 outlier, didn’t affect results • Loss by account, domain • similar loss seen from, to most accounts • 1 outlier, didn’t affect results

  8. Cause of email loss • Generally, very hard to ascertain • don’t have logs from each server on each domain • Special request of 1 domain : 4 accounts • 2 normal; 2 content spam filters tag but don’t drop • overall loss rates : 3.25%, 3.02%, 3.00%, 3.15% • without content filters : 0.37%, 0.41% • Majority of loss was from content filters • but not all; rest due to blacklist filter or server issues • likely not blacklist (very easy to spot in data)

  9. Outline • Motivation • Possible causes • Email loss experiment • definitions • setup • results • SureMail • design requirements • notifications • implementation • Related work

  10. Desired requirements • Minimal disruption to email system • email is >95% reliable. don’t rip its guts out • Minimal user impact • email is >95% reliable. only bother user rest of the time • Preserve asynchronous communication, 4. privacy • don’t expose when receiver is online, when email is read • how many total emails sent, received, etc. • Allow third-party repudiability • signatures not required today. don’t require it for reliability • Preserve spam, malware defenses • Minimal network, compute overhead • large email domains spend $$ to handle BW & load

  11. You’ve lost mail! SureMail • Imagine a notification channel • very reliable, but only for notifications, not emails • Each time email is sent • also send notification : 20 byte hash(email) • if receiver gets notification but not email  loss Receiver R Sender S notification channel

  12. Notification channel : out of band • Separate service to exchange messages • central server, DHT, Amazon S3, … • “NX” servers • more reliable : no spam filter, no malware filter, small • fast : simple put/get service • requires deployment • Requirements • min disruption • min user impact • async. communication • privacy fix • repudiability • spam/malware defense • min overhead • DoS resilient fix

  13. Notification channel : in band • Use email header • embed hashes of last 3 emails • think of TCP sequence number • unreliable : only works with intermittent loss • slow : depends on email frequency • easy : no additional infrastructure needed • Requirements • min disruption • min user impact • async. communication • privacy • repudiability • spam/malware defense • min overhead

  14. Reply based shared secret • But … annoyance attack • Mallory sends email to Bob • but spoofed as though from Alice; with 3 bogus notifications • Bob thinks he has lost 3 emails from Alice • Reply based shared secret • observation : users don’t converse with spammers • else not a spammer ? • solution : setup shared key • AE1B ; B replies BE2A ; A replies AE3B • use message IDs of E1, E2 to do Diffie-Hellman exchange • sign notifications • assumption : no eavesdroppers • else, bigger privacy problems

  15. Implementation • Out-of-band notification channel • front end shim : C++ • storage backend : Amazon SQS • would cost ~ 0.08cents / user / month (1000 notifications / day) • email loss experiment : 99.9976% notification reliability • In band notification channel • Outlook 2007 plug-in • composes & processes notifications; maintains state; alerts loss • in progress • reply based shared secret • handling multiple clients (e.g. work, home, laptop) • handling OMA / OWA changes

  16. Details in paper • Is email delayed or lost ? then what? • warn user after 2 hours • allow user to decide • Corner cases • First time sender • who is a spammer vs. a legitimate first time sender? • One way communication (e.g. bank statements) • Mailing lists

  17. Related work • Email loss • [AB2005] : 0.1% - 5% loss • but measured bounceback loss • [Lang2004] : 0.69% - 4% loss • but only 1 sender, unusual emails • MDN (read receipts) • user privacy problem • DSN (bouncebacks) • spam reflection, corp. privacy, helps spammers • Better spam filters & whitelists • definitely useful against loss due to spam filter • SureMail works regardless of cause of loss • some loss not due to spam filters

  18. Summary • Extensive email loss study • 0.7% - 1% silent loss • SureMail notifications • small, fixed format, no human readable content • no spam filter, malware filter needed • can be handled far more reliably than email • reply-based shared secret to block spoofing • In-band & OOB channels • Implementation release expected in Fall 2007

  19. Questions? • Thanks for university accounts! • H. Balakrishnan (MIT), P. Barford (Wisconsin), C. Dovrolis (GaTech), R. Govindan (USC), S. Keshav (Waterloo), L. Qiu (UT Austin), J. Rexford (Princeton), H. Schulzrinne (Columbia), D. Veitch (Melbourne), L. Zhang (UCLA)

More Related