1 / 29

Search Engines & Privacy

Search Engines & Privacy. A discussion of Roessler’s paper ECE5560 Randy Marchany. Introduction. Users submit condensed and accurate information on topics of interest to search engines. This makes search engines a target for profiling and advertising companies.

jens
Download Presentation

Search Engines & Privacy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Engines & Privacy A discussion of Roessler’s paper ECE5560 Randy Marchany Copyright 2003, Marchany

  2. Introduction • Users submit condensed and accurate information on topics of interest to search engines. • This makes search engines a target for profiling and advertising companies. • Giving personal data such as an email address ONCE makes all past/future actions linkable to the real identity. Copyright 2003, Marchany

  3. Introduction • A major search engine was found to use redirects enabling the engine server to log which of the search hits were actually visited. • This prompted Roessler’s study. • Altavista, Excite, Google, Lycos, Hotbot, Webcrawler, Yahoo, Metacrawler, Looksmart, Directhit, Goto, Go, Mamma engines were examined. Copyright 2003, Marchany

  4. Introduction • Only privacy related issues concerning submitting search strings or visiting found www pages or occurring w/o user interaction were examined. • Automatic loading of an image is an example of non-user interaction • Several techniques that impact user’s privacy were discovered. Copyright 2003, Marchany

  5. Introduction • Use of images instead of buttons for submitting html-forms. If the image is clicked, the coordinates of the click are xmitted along with the data in the form. • It’s possible to determine which browser was used based on the results. • A probabilistic search for determining whether a GUI based browser was used is possible. Copyright 2003, Marchany

  6. Introduction • The URL of the search results usually contains the query string. • Parameters associated w/URL are leaked to every site visited. • Means all advertisers displaying banners on the search results page and all visited sites will know the search string if referers aren’t filtered by proxy. • Long times set for cookie lifetimes Copyright 2003, Marchany

  7. Search Engine Results • Altavista.com • Cookies expire sometime in 2013 • Search strings leaked to ad.doubleclick.net possibly including IP address if no proxy is used. • Happens w/o any user intervention. • Uses redirectors to tell which sites were visited. Copyright 2003, Marchany

  8. Search Engine Results • Altavista.com • With cookies & query strings, we have a nice database.  • Redirects are hidden from users if Javascript is enabled. Copyright 2003, Marchany

  9. Search Engine Results • Excite.com • Cookies expire in 2004 • Search hits are linked using redirects and the real URL is hidden from the users. • Id (different for search hits) and pos (link position on the page) are collected. Copyright 2003, Marchany

  10. Search Engine Results • Google.com • Cookies expire in 2038. • Search hits are usually linked directly w/no redirects. Sometimes happens. • No information leaks w/o user interaction. • No Javascript code is used. This may have changed. Copyright 2003, Marchany

  11. Search Engine Results • Hotbot.com • Cookies expire in 2038. • Images are used as buttons. • Query string leaked to doubleclick.net • Search hit URLS are redirected. • Userid contained in p-uniqid field of 1 of the cookies is passed to redirector. Copyright 2003, Marchany

  12. Search Engine Results • Webcrawler.com • Cookies expire in 2010. • Loads Javascript from ad.preferences.com. Displays some banners and xmits TZ offset to this site. Happens w/o user interaction. • Contains a web bug. Copyright 2003, Marchany

  13. Search Engine Results • Yahoo.com • Cookies expire in 2010. • Search site matches are redirected. • ‘Search web page’ matches are redirected. Copyright 2003, Marchany

  14. Search Engine Results • Metacrawler.com • Cookies are set by metacrawler.com AND swizzle.go2net.com. Not easily detected way of sharing user information. • Leaks the query string to blink.com • Javascript allows query results page to be logged. Copyright 2003, Marchany

  15. Search Engine Results • Sponsored links are among the regular search hits and not easily distinguishable. They are linked via redirects. • Query Strings are leaked to realnames.com Copyright 2003, Marchany

  16. Search Engine Results • Looksmart.com • Cookies expire on 2011. • If you disable cookies, you can be tracked using html-embedded cookies to pin the user. • Pins are passed across separate querys and servers. • No direct links used. Redirects. Copyright 2003, Marchany

  17. Search Engine Results • Directhit.com • Query string leaked to doubleclick.net w/o any user interaction. • The query string is passed as a parameter with some image request. • No direct links used, only redirects. Copyright 2003, Marchany

  18. Search Engine Results • Goto.com • Cookies expire on 2011. • Query string is leaked to doubleclick.net via Javascript. • The user is “pinned” with a session ID that is passed across different queries. • No direct links, uses mysterious way to xmit the pin w/the redirect. Copyright 2003, Marchany

  19. Search Engine Results • Mamma.com • Doesn’t use cookies but leaks the query string to doubleclick.net, admonitor.net w/o user interaction. • Pins all users w/ session ID. • No direct links, only redirects through multiple layers. Copyright 2003, Marchany

  20. Summary • Problem: IP leakage • Impact: static IP addresses can identify and trace a user across several sessions. DHCP addresses takes a little more work but can be done. • Solution: use proxies or anonymizers. Copyright 2003, Marchany

  21. Summary • Problem: Cookies • Impact: tracing the user across multiple pages and over several sessions is trivial. • Solution: disable cookies. Copyright 2003, Marchany

  22. Summary • Problem: HTTP header contains plenty of info such as locale, OS, browser version. • Impact: gives out demographic information. • Solution: use filtering proxy Copyright 2003, Marchany

  23. Summary • Problem: html-embedded cookies • Impact: allows tracing over several pages but not sessions. • Solution: none. Content filtering may help, disabling hidden fields in html forms. Copyright 2003, Marchany

  24. Summary • Problem: leaking query strings by passing them as parameters to external servers. • Impact: external sites can use query strings to build a composite of your identity. • Solution: none. Copyright 2003, Marchany

  25. Summary • Problem: Javascript loaded from other sites • Impact: JS is too powerful and enables the server to obtain IP address, local config info, etc. Allows easy tracing of users, screen resolution, used plugins, OS, Browser info • Solution: disable Javascript Copyright 2003, Marchany

  26. Summary • Problem: redirected links • Impact: server knows which links the user chose to follow. • Solution: don’t use servers that redirect links. Copyright 2003, Marchany

  27. Summary • Problem: sharing identifiers using 302-redirects • Impact: shares cookies and other user identifiers in a way that is difficult to trace by the user. • Solution: Difficult. Disabling cookies is one method but not foolproof. Copyright 2003, Marchany

  28. Summary • Problem: x/y field tracing allows browser identification. • Impact: lynx users can be detected easily. • Solution: patch Lynx. Copyright 2003, Marchany

  29. Summary • Paranoid people tend to use google or lycos. • WWW designers shouldn’t use GET for their search forms. Use POST instead. This helps prevent query string leakage because POST submits aren’t in the URL. Copyright 2003, Marchany

More Related