330 likes | 471 Views
Chipping Away at Censorship with User-Generated Content. Sam Burnett, Nick Feamster and Santosh Vempala. Internet Censorship is a Problem. 12 censors 11 monitors 25% of population More on the way. See http://rsf.org for more. It’s Not Only China…at Home, Too.
E N D
Chipping Away at Censorship with User-Generated Content Sam Burnett, Nick Feamster and SantoshVempala
Internet Censorship is a Problem • 12 censors • 11 monitors • 25% of population • More on the way See http://rsf.org for more
Intro to Internet Censorship Censor Censor Firewall Bob Alice Block Traffic Punish User Censored net Uncensored net
Solution: Use a Helper Helper The helper sends messages to and from blocked hosts on your behalf Firewall Bob Alice Censored net Uncensored net
Design Goals for the Helper • Berobust against blocking • Be deniable against user identification • Require nodedicated infrastructure
What about Proxies and Mixnets? (e.g., Tor) Proxy Proxy Firewall Alice Bob Censored net Uncensored net • Censors can block proxies if the proxy list is public • Not deniable if encryption is incriminating • Requiresdedicated infrastructure (network of proxies)
What About Covert Channels?(e.g., Infranet) • Not entirely robust against blocking • More deniable because messages are hidden • Requires dedicated infrastructure (Web servers) usenix.org Unblocked host Firewall Alice Bob Censored net Uncensored net
Collage: Let User-Generated Content Help Defeat Censorship User-generated content hosts Alice Bob, a Flickr user • Robust by using redundancy • Users generate innocuous-looking traffic • No dedicated infrastructure required
Why Might Collage Work? • Lots of User-Generated Content (UGC) • More than 4 billion Flickr images • A day of video uploaded to YouTube every minute • Many sites host UGC • We have tools to store censored data in UGC • Steganography, watermarking
Outline • Background and Design Goals • Collage Design • Performance and Demo
Collage, Step-by-Step Embedded Vector Vector Message Alice Bob Content host Collage steps: Obtain message Pick message identifier Obtain cover media Embed message in cover Upload UGC to content host Find and download UGC Decode message from UGC Step 1: Obtain message • Step 3: Obtain cover media • Your personal photos • Generous users • Step 4: Embed message in cover • Next slide Step 5: Upload UGC to content host Step 6: Find and download UGC Step 7: Decode message from UGC • Step 2: Pick message identifier • Application specific • Only intended recipient should know it
Embedding Messages in Vectors • Encrypt the message using the identifier • Generate chunks using erasure coding • Generate many chunks, recover from any k-subset • Allows splitting among many vectors, robustness • Embed chunks into vectors • Steganography: hard to detect • Watermarking: hard to remove • Do the reverse to decode Collage steps: Obtain message Pick message identifier Obtain cover media Embed message in cover Upload UGC to content host Find and download UGC Decode message from UGC
Where Are Bob’s Vectors? • Crawling all of Flickr is not an option • Must agree on a subset of a content host without any immediate communication Solution: A predictable way of mapping message identifiers to subsets of content hosts Collage steps: Obtain message Pick message identifier Obtain cover media Embed message in cover Upload UGC to content host Find and download UGC Decode message from UGC
Solution: Task Mapping Message Identifier Tasks 1 http://nytimes.com • Hash the identifier • Hash the tasks • Map identifier to closest tasks 11 11 • Receivers perform these tasks to get vectors • Senders publish vectors so that when receivers perform tasks, they get the sender’s vectors Search for blue flowers on Flickr 9 9 3 Look at JohnDoe’s videos on YouTube Tasks Collage steps: Obtain message Pick message identifier Obtain cover media Embed message in cover Upload UGC to content host Find and download UGC Decode message from UGC 6
How Does Collage Meet the Design Goals? • Robust against blocking • Erasure coding • Many content hosts • Deniable against user identification • Traffic only to/from content hosts • Depends upon task construction • Require nodedicated infrastructure • Messages stored on content hosts
How Do You Start Using Collage? Send & Receive Messages Help Censored Users Donate your UGC vectors Photos on Flickr Tweets on Twitter Etc. Write Collage applications • Distribute software • CDROM • Spam everyone • A secure network • Refresh task list • Receive using Collage • Online resource • Message identifier • Application specific
Outline • Background and Design Goals • Collage Design • Performance and Demo
Performance Metrics • Sender and receiver traffic overhead • Sender and receiver transfer time • Storage required on content hosts But these metrics can vary a lot: • Different content hosts • Different tasks • Different applications
Case Study Experiments performed on a 768/128 Kbps DSL connection
What Should You Do Now? • Try out the demo application • Donate your photos • For now, only Flickr Pro users • Embed news articles when you upload photos Visit http://gtnoise.net/collage
Conclusion • Collage evades Internet censorship by tunneling messages inside user-generated content • Robust against blocking • Deniable against user identification • Requires no dedicated infrastructure • More work needed • Statistical deniability against traffic analysis • Learn timing behavior from users • Tor bridge discovery http://gtnoise.net/collage
Yes, Steganography is Broken • Doesn’t mean it isn’t still practical • Collage can use new information hiding techniques as they come along • Not dependent upon a particular technology • Use many hiding techniques in parallel
“A Web based Covert File System”, HotOS ’07 • Stores files inside of Flickr photos • Intended for personal file storage • Collage is more robust • Erasure coding • Automatically spread data among content hosts • Could be implemented using Collage • Implementation not publically available
Where Could We Store Content? • Photo sharing sites (Wikipedia lists 71) • Video sharing sites (Wikipedia lists 68) • Web forums, blogs • Plugins for popular forum and blogging software
What If Senders and Receivers Disagree on Task Lists? • Consistent hashing ensures some agreement • Depends on number of tasks per identifier: • If identifier mapped to 2 tasks, can still communicate with 25% disagreement • If identifier mapped to 4 tasks, can still communicate with 50% disagreement
Agreeing on Message Identifiers • Goal: Agree without communicating • For news reader application: • Public knowledge, hopefully censor doesn’t block • Another option: • Exchange keys with communicators beforehand • Identifier is, e.g., ciphertext of current date
What Affects Deniability? • Task construction: • The content host • Popularity the sequence of Web requests made • Injection of think times • Injection of garbage requests • Alternative routes to the same vectors • Frequency of task execution • Are you uploading suspicious vectors? • E.g., tagging pictures with bogus terms
Censorship Is an Arms Race • Popular systems will eventually be blocked • We must continually create systems that are harder to compromise