300 likes | 461 Views
Search Engines & Privacy. A discussion of Roessler’s paper ECE5560 Randy Marchany. Introduction. Users submit condensed and accurate information on topics of interest to search engines. This makes search engines a target for profiling and advertising companies.
E N D
Search Engines & Privacy A discussion of Roessler’s paper ECE5560 Randy Marchany Copyright 2003, Marchany
Introduction • Users submit condensed and accurate information on topics of interest to search engines. • This makes search engines a target for profiling and advertising companies. • Giving personal data such as an email address ONCE makes all past/future actions linkable to the real identity. Copyright 2003, Marchany
Introduction • A major search engine was found to use redirects enabling the engine server to log which of the search hits were actually visited. • This prompted Roessler’s study. • Altavista, Excite, Google, Lycos, Hotbot, Webcrawler, Yahoo, Metacrawler, Looksmart, Directhit, Goto, Go, Mamma engines were examined. Copyright 2003, Marchany
Introduction • Only privacy related issues concerning submitting search strings or visiting found www pages or occurring w/o user interaction were examined. • Automatic loading of an image is an example of non-user interaction • Several techniques that impact user’s privacy were discovered. Copyright 2003, Marchany
Introduction • Use of images instead of buttons for submitting html-forms. If the image is clicked, the coordinates of the click are xmitted along with the data in the form. • It’s possible to determine which browser was used based on the results. • A probabilistic search for determining whether a GUI based browser was used is possible. Copyright 2003, Marchany
Introduction • The URL of the search results usually contains the query string. • Parameters associated w/URL are leaked to every site visited. • Means all advertisers displaying banners on the search results page and all visited sites will know the search string if referers aren’t filtered by proxy. • Long times set for cookie lifetimes Copyright 2003, Marchany
Search Engine Results • Altavista.com • Cookies expire sometime in 2013 • Search strings leaked to ad.doubleclick.net possibly including IP address if no proxy is used. • Happens w/o any user intervention. • Uses redirectors to tell which sites were visited. Copyright 2003, Marchany
Search Engine Results • Altavista.com • With cookies & query strings, we have a nice database. • Redirects are hidden from users if Javascript is enabled. Copyright 2003, Marchany
Search Engine Results • Excite.com • Cookies expire in 2004 • Search hits are linked using redirects and the real URL is hidden from the users. • Id (different for search hits) and pos (link position on the page) are collected. Copyright 2003, Marchany
Search Engine Results • Google.com • Cookies expire in 2038. • Search hits are usually linked directly w/no redirects. Sometimes happens. • No information leaks w/o user interaction. • No Javascript code is used. This may have changed. Copyright 2003, Marchany
Search Engine Results • Hotbot.com • Cookies expire in 2038. • Images are used as buttons. • Query string leaked to doubleclick.net • Search hit URLS are redirected. • Userid contained in p-uniqid field of 1 of the cookies is passed to redirector. Copyright 2003, Marchany
Search Engine Results • Webcrawler.com • Cookies expire in 2010. • Loads Javascript from ad.preferences.com. Displays some banners and xmits TZ offset to this site. Happens w/o user interaction. • Contains a web bug. Copyright 2003, Marchany
Search Engine Results • Yahoo.com • Cookies expire in 2010. • Search site matches are redirected. • ‘Search web page’ matches are redirected. Copyright 2003, Marchany
Search Engine Results • Metacrawler.com • Cookies are set by metacrawler.com AND swizzle.go2net.com. Not easily detected way of sharing user information. • Leaks the query string to blink.com • Javascript allows query results page to be logged. Copyright 2003, Marchany
Search Engine Results • Sponsored links are among the regular search hits and not easily distinguishable. They are linked via redirects. • Query Strings are leaked to realnames.com Copyright 2003, Marchany
Search Engine Results • Looksmart.com • Cookies expire on 2011. • If you disable cookies, you can be tracked using html-embedded cookies to pin the user. • Pins are passed across separate querys and servers. • No direct links used. Redirects. Copyright 2003, Marchany
Search Engine Results • Directhit.com • Query string leaked to doubleclick.net w/o any user interaction. • The query string is passed as a parameter with some image request. • No direct links used, only redirects. Copyright 2003, Marchany
Search Engine Results • Goto.com • Cookies expire on 2011. • Query string is leaked to doubleclick.net via Javascript. • The user is “pinned” with a session ID that is passed across different queries. • No direct links, uses mysterious way to xmit the pin w/the redirect. Copyright 2003, Marchany
Search Engine Results • Mamma.com • Doesn’t use cookies but leaks the query string to doubleclick.net, admonitor.net w/o user interaction. • Pins all users w/ session ID. • No direct links, only redirects through multiple layers. Copyright 2003, Marchany
Summary • Problem: IP leakage • Impact: static IP addresses can identify and trace a user across several sessions. DHCP addresses takes a little more work but can be done. • Solution: use proxies or anonymizers. Copyright 2003, Marchany
Summary • Problem: Cookies • Impact: tracing the user across multiple pages and over several sessions is trivial. • Solution: disable cookies. Copyright 2003, Marchany
Summary • Problem: HTTP header contains plenty of info such as locale, OS, browser version. • Impact: gives out demographic information. • Solution: use filtering proxy Copyright 2003, Marchany
Summary • Problem: html-embedded cookies • Impact: allows tracing over several pages but not sessions. • Solution: none. Content filtering may help, disabling hidden fields in html forms. Copyright 2003, Marchany
Summary • Problem: leaking query strings by passing them as parameters to external servers. • Impact: external sites can use query strings to build a composite of your identity. • Solution: none. Copyright 2003, Marchany
Summary • Problem: Javascript loaded from other sites • Impact: JS is too powerful and enables the server to obtain IP address, local config info, etc. Allows easy tracing of users, screen resolution, used plugins, OS, Browser info • Solution: disable Javascript Copyright 2003, Marchany
Summary • Problem: redirected links • Impact: server knows which links the user chose to follow. • Solution: don’t use servers that redirect links. Copyright 2003, Marchany
Summary • Problem: sharing identifiers using 302-redirects • Impact: shares cookies and other user identifiers in a way that is difficult to trace by the user. • Solution: Difficult. Disabling cookies is one method but not foolproof. Copyright 2003, Marchany
Summary • Problem: x/y field tracing allows browser identification. • Impact: lynx users can be detected easily. • Solution: patch Lynx. Copyright 2003, Marchany
Summary • Paranoid people tend to use google or lycos. • WWW designers shouldn’t use GET for their search forms. Use POST instead. This helps prevent query string leakage because POST submits aren’t in the URL. Copyright 2003, Marchany