Challenges to Privacy in New Internet Applications: VoIP, IM, location-based services

Challenges to Privacy in New Internet Applications: VoIP, IM, location-based services Prof. Henning Schulzrinne Computer Science Columbia University, New York ATIS Network Security Symposium and Workshop Washington, DC September 2004

Overview • Email spam: a history of failed miracle cures • New challenges emerging: • VoIP unsolicited calls • instant messaging • Location and presence privacy • Reputation management

Not just email • Email is just first large-scale, open communication medium • initially also “closed user groups” (DECmail, PROFS, UUnet, Fido, …) • When does UBC occur? • Single domain  large number of independently operated domains • removes easy remote authentication • Published or guessable addresses • as old as unlisted numbers  conflict between usability and using addresses as communication keys • Others emerging from closed user groups • instant messaging (IM) • VoIP and multimedia calls • presence queries

The universe of message senders human user (known and unknown) opt-in bulk communications robots (event notification) machine human mailing lists (forwarder) machine  machine (EDI) human involvement

The problem is easy… • if you’re willing to make some minor assumptions: • single administrative domain • only previously-known senders (but how?) • global public key infrastructure (PKI) • only real human users, no lists

Communication challenges • Joe-job • “The act of faking a spam so that it appears to be from an innocent third party, in order to damage their reputation and possibly to trick their provider into revoking their Internet access. Named after Joes.com, which was victimized in this way by a spammer some years ago.” • Phishing • “The act of sending an e-mail to a user falsely claiming to be an established legitimate enterprise in an attempt to scam the user into surrendering private information that will be used for identity theft.” • Spam, spim (unsolicited bulk communications) • Nuisance communications

Tools available  countermeasures • From address blacklisting • IP sender blacklisting • Content filtering (Bayesian filters) MUA mail sender SMTP marked with header POP IMAP DNS MTA spam folder SPF, SBL, …

Miracle cures

The UBC arms race IP blacklisting open relays From blacklisting RBL, SPEWS sender faking bot armies SPF, DMP, … Bayesian filters pictures dictionary attacks

“We need a new mail/IM protocol” • True: SMTP not designed for today’s hostile Internet • no sender authentication • no easy policy inclusion • False: A new mail protocol is going to fix UBE/UBC • Hard problems are ecosystem, not protocol: • authentication – domains and individuals • PKI (S/MIME, PGP) has never scaled • current email certificates just certify ownership of email address • help with whitelist, but not with unknown users • too costly for true verification • reputation • accreditation

IETF MARID • IETF working group for verifying sender • “It would be useful for those maintaining domains and networksto be able to specify that individual hosts or nodes are authorizedto act as MTAs for messages sent from those domains or networks.This working group will develop a DNS-based mechanism forstoring and distributing information associated with that authorization.” • related to IRTF ASRG (Anti-spam Research Group) • DNS extensions, “purported responsible address”

MARID processing client SMTP validation (CSV) “Given an email message, and given an IP address from which it has been (or will be) received, is the SMTP client at that IP address authorized to send that email message?” Client authenticated, authorized and accredited? Y extract purported responsible address (PRA) extract purported responsible domain (PRD) SPF: IP legal for PRD? N Y

MARID: Client SMTP Validation (CSV) EHLO aol.com from 64.12.187.24 EHLO domain real? A(aol.com) authentication draft-ietf-marid-csv-intro IN A 64.12.187.24 Host authorized to be MTA? authorization SRV(_client._smtp.aol.com) SRV weight=2 draft-ietf-marid-csv-csa Domain reputation? PTR(aol.com) accreditation draft-ietf-marid-csv-dna _vouch.smtp.isgood.com TXT(aol.com.isgood.com) IN TXT MARID,1,A

PRA (Purported responsible address) • “Allows one to determine who appears to have most recently caused an e-mail message to be delivered. It does this by inspecting the headers in the message.” (draft-ietf-marid-pra) • uses Resent-Sender, Resent-From, Sender, From RFC 2822 headers • draft-ietf-marid-submitter defines new MAIL parameter for SMTP alice@example.com almamater.edu bob@company.com S: 220 company.com.example ESMTP server ready C: EHLO almamater.edu.example S: 250-company.com.example S: 250-DSN S: 250-AUTH S: 250-SUBMITTER S: 250 SIZE C: MAIL FROM:<alice@example.com> SUBMITTER=bob@almamater.edu.example S: 250 <alice@example.com> sender ok C: RCPT TO:<bob@company.com.example>

SPF, Sender-ID • SPF (sender policy framework) • Verifies that most recent sender (e.g., mailing list forwarder) is authorized for its domain • Does not prevent spam, but enables white and black-listing • Adds DNS TXT or SPF resource record (RR) for domain • spf2.0/mfrom,pra +mx +a:192.1.2.0/28 –all • “mail from MX server for example.com and from IP 192.1.2.0 are ok; all others are bad” HELO or EHLO SMTP connection MAIL FROM body delivery From:

Putting the tools together transitive trust model: intra-domain user, inter-domain domain/host-only authentication SPF CSV bpm.com SMTP server SMTP SMTPAUTH submission (password) bob@nyu.edu alice@bpm.com • accreditation: • aol.com does not host spammers • bpm.com verifies user identities (not yet)

What’s different about IM and VoIP? • Higher nuisance factor • combine the worst of email and phone telemarketing • Close to zero cost • call origination has no capacity limitation (unlike PSTN line limitation) • can be originated in volume from residential broadband – not T1 required • T1: 2.4 call attempts/second @ $1000/month + LD • 500 kb/s DSL: 9 call attempts/second @ $50/month • easy to get addresses: SIP address = email address or E.164 number • non-US origin: cheap labor, no DNC laws • Privacy invasion • know user is actually there • Nuisance calls • possibly no good way to trace • already a problem with Skype

SIP spam • Call spam • telemarketing • content filtering likely ineffective • IM spam • SIP MESSAGE or message sessions • spam intent may not be obvious in first message • get attention first with “Hello” • short messages harder to analyze with content filters • but typically requires white-listing based on presence subscription • Presence spam (request addition to watcher list) • mostly nuisance – user may need to manually deny request J. Rosenberg, C. Jennings, draft-rosenberg-sipping-spam, July 2004

SIP spam prevention • All earlier mechanisms apply, with largely the same caveats • Black lists • domain-level • within domain, only if domain practices sound user management • White list • may use buddy list as white list • stronger user authentication • Consent-based communication • needs to subscribe first • but may not be able to recognize address (“is bob@isp.com a spammer or some long-lost friend?”)

SIP spam prevention • Use of MARID-like DNS domain verification possible • may not be needed, due to usage of TLS for interdomain communications • but doesn’t preclude rogue sub-domains • e.g., “is hgs10.columbia.edu allowed to route SIP calls for columbia.edu?” • transitive trust principle: • trust that previous hop applied identity management principles • longer term, use S/MIME certificates for user-level authentication, but doesn’t improve spam prevention much • not widely available now • if S/MIME certificates are cheap, spammers can mint new identities

SIP authentication destination proxy (identified by SIP URI domain) outbound proxy Digest auth over TLS TLS mutual host verification insert crypto-signed identity assertion (AIB sip-identity) a@foo.com: 128.59.16.1 SIP trapezoid registrar voice traffic (S)RTP

Not all domains can be classified as “good” or “bad” as a whole Many different domain types: Employer ISP Associations (IEEE, ACM, ATIS, …) Personal domains Mailbox providers Divide domains by their user policy: Admission-controlled domains most employers Bonded domains Membership domains e.g., credit card Open, rate-limited domains Open domains From domain to user policies Kumar Srivastava, Henning Schulzrinne, “Preventing Spam for SIP-based Instant Messaging and Sessions”, Columbia University Technical Report, September 2004.

Reputation and domain descriptions • Need to define mechanism to obtain domain user verification policy • Individual user reputation: • deposit positive or negative feedback information based on calls • depends on cooperation of domain • limit user feedback rate to avoid ballot-stuffing • Fortunately, there seem to be few part-time spammers 

Using social networks for spam control is a friend of strength of knowledge = 0.3 trust in good behavior = 0.5 total trust = ∑ (strength * trust)

Privacy: Context • context = “the interrelated conditions in which something exists or occurs” • anything known about the participants in the (potential) communication relationship • both at caller and callee

Claim: all using protocols fall into one of these categories Presence or event notification “circuit-switched” model subscription: binary decision Messaging email, SMS basically, event notification without (explicit) subscription but often out-of-band subscription (mailing list) Request-response RPC, HTTP; also DNS, LDAP typically, already has session-level access control (if any at all) Presence is superset of other two GEOPRIV IETF working group looking generically at location services (privacy) SIMPLE and SIP: event notification, presence Architectures for (geo) information access

GEOPRIV and SIMPLE architectures rule maker rule interface target location server location recipient notification interface publication interface GEOPRIV SUBSCRIBE presentity presence agent watcher SIP presence PUBLISH NOTIFY caller callee SIP call INVITE INVITE

GEOPRIV and SIMPLE Policy rules • There is no sharp geospatial boundary • Discussed in both GEOPRIV (geospatial) and SIMPLE (SIP IM) • Presence contains other sensitive data (activity, icons, …) and others may be added • Example: future extensions to personal medical data • “only my cardiologist may see heart rate, but notify everybody in building if heart rate = 0” • Thus, generic policies are necessary

Presence/Event notification • Three places for policy enforcement • subscription  binary • only policy, no geo information • subscriber may provide filter  could reject based on filter (“sorry, you only get county-level information”)  greatly improves scaling since no event-level checks needed • notification  content filtering, suppression • only policy, no geo information • third-party notification • e.g., event aggregator • can convert models: gateway subscribes to event source, distributes by email • both policy and geo data

Presence policy XML rules managed via XCAP SUBSCRIBE subscription policy subscriber (watcher) for each watcher event generator policy subscriber filter rate limiter change to previous notification? NOTIFY

Policy relationships common policy geopriv-specific presence-specific future RPID CIPID

PIDF-LO (location object) • Basic location object • civic and geospatial • typically, in conjunction with presence • contains source and authority • basic privacy rules: • retention period • redistribution allowed ?xml version="1.0" encoding="UTF-8"?> <presence xmlns="urn:ietf:params:xml:ns:pidf" xmlns:gp="urn:ietf:params:xml:ns:pidf:geopriv10" xmlns:gml="urn:opengis:specification:gml:schema-xsd:feature:v3.0" entity="pres:geotarget@example.com"> <tuple id="sg89ae"> <status> <gp:geopriv> <gp:location-info> <gml:location> <gml:Point gml:id="point1" srsName="epsg:4326"> <gml:coordinates>37:46:30N 122:25:10W</gml:coordinates> </gml:Point> </gml:location> </gp:location-info> <gp:usage-rules> <gp:retransmission-allowed>no</gp:retransmission-allowed> <gp:retention-expiry>2003-06-23T04:57:29Z</gp:retention-expiry> </gp:usage-rules> </gp:geopriv> </status> <timestamp>2003-06-22T20:57:29Z</timestamp> </tuple> </presence>

Privacy rule sets • Conditions such as… • identity of requestor • time-of-day • sphere • Actions • e.g., allow subscription • Transformation • e.g., reduce accuracy of geo data <rule id="f3g44r1"> <conditions> <identity> <uri>bob@example.com</uri> </identity> <validity> <from>2003-12-24T17:00:00+01:00</from> <to>2003-12-24T19:00:00+01:00</to> </validity> </conditions> <actions></actions> </rule>

Conclusion • Protocol and technical means as a complement to legal actions • Identity-based techniques more promising than content-based approaches • New applications (VoIP, IM, presence) vulnerable to unsolicited communications • with possibly larger impact due to lower cost, legal barriers • content-based techniques fail altogether • New applications do not lend themselves to current content-based spam prevention techniques • Domain-based rather than person-based mechanisms appear promising • Need policy languages for sharing private data

Challenges to Privacy in New Internet Applications: VoIP, IM, location-based services