1 / 26

Automatic Ad Blocking

Automatic Ad Blocking. Justin Crites and Mathias Ricken November 24, 2004 Comp 527 – Computer Systems Security Rice University. Web Advertisement – The Facts. Annoying Animation Sound Potentially dangerous May contain malware Ad download reveals user’s IP

dobry
Download Presentation

Automatic Ad Blocking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Ad Blocking Justin Crites and Mathias Ricken November 24, 2004 Comp 527 – Computer Systems Security Rice University

  2. Web Advertisement – The Facts • Annoying • Animation • Sound • Potentially dangerous • May contain malware • Ad download reveals user’s IP • Deteriorate user’s web experience

  3. Web Advertisement – The Future • More ads • Q3 2003: $1.79 billion online ad spending • Q3 2004: $2.43 billion (+36%) • Bigger, richer ads • More half-page ads, fewer banners • 33% rich media (Flash, popups) • Expected to surpass image ads in 2005 Source: Interactive Advertising Bureau Source: DoubleClick

  4. “Two click” blocking • Automatic online updates • Blocking of entire HTML page sections Project Goals • Make ad blocking easy to use • Particularly for novice users

  5. Project Platform • Mozilla Firefox • Open-source browser • Extensible through JavaScript • Source code for many plugins available as examples • AdBlock • Open-source plugin • Filter certain HTML elements based on regular expressions

  6. AdBlock in Detail • Filter out HTML elements • <IMG> • <EMBED> • Selection controlled by blacklist with wildcarded URLs • Example: http://*.somedomain.com/pagead/ads?* • * denotes zero or more arbitrary characters • <IFRAME> • <OBJECT>

  7. AdBlock Problems • Wildcards entered manually • Concepts too difficult for non-technical users • No simple way to share filters • AdBlock only blocks very few HTML elements • Non-recursive

  8. Developing for Mozilla Firefox • Register plugin as “chrome provider” for • Skin • Content • AdBlock is a content provider • Plugins consist of • XUL – XML user interface definition • JavaScript – event scripting • Platform • Localization

  9. Problems for Developers • JavaScript issues • No strict typing • No variable declarations necessary • Lack of debugging support • Uninformative errors • No single stepping/breakpoints • Code of plugins split between several files • Confusing control flow, if badly written

  10. “Two Click” Blocking

  11. “Two Click” Blocking • User blocks items with minimum interaction • Right-click  Context menu • Left-click  Autoblock • AdBlock now has three lists • User filters (like before) • Autoblock URLs • Generated filters

  12. Wildcarded URL Generation • Intelligently build wildcarded URL • Go from http://ad.domainname.com/ads/someimage.gifhttp://xxx.domainname.com/ads/anotherpic.jpg…tohttp://*.domainname.com/ads/* • Keep matching parts, replace different parts with *  Longest Common Subsequence (LCS)

  13. Longest Common Subsequence • Dynamic programming, O(n2) in time and space • Nested for-loops • If a[i] = b[j],M[i, j]  M[i-1, j-1] + 1 • else pick maximum from left or above 0 0 0 1 1 1 1 1 1 1 2 2

  14. Longest Common Subsequence • Dynamic programming, O(n2) in time and space • Nested for-loops • If a[i] = b[j],M[i, j]  M[i-1, j-1] + 1 • else pick maximum from left or above • Example: BDAB • Insert * after diagonal streches • BD*AB* 0 0 0 1 1 1 1 1 1 1 2 2

  15. LCS to Wildcarded URL • LCS often includes very short fragments http://ad.domainname.com/ads/someimage.gifhttp://xxx.domainname.com/ads/anotherpic.jpggenerateshttp://*.domainname.com/ads/*o*e*i*.*  Only accept matching fragments with length > 2 • Cannot merge all URLs together • Result would be http://*or similar

  16. URL Merging • Only merge two URLs if similar enough • Look at wildcarded URL from LCS • Remove all non-alpha-numeric characters • Remove common fragments • http • com and other top-level domains • gif and other file suffixes • Merge only if string is still non-empty • else try merging with other URL or keep separate

  17. Improving URL Merging • Wildcarded URLs are sometimes too general • Possible improvements • Do not merge across domains • Take directory structure into account • Treat numbers as one entity, not separate characters

  18. Automatic Online Updates

  19. Automatic Online Updates • New menu item in “Preferences” dialog • Import filters from URL • Can automatically update after specified interval, e.g. one week • Circumvents file system • Users can import filters from trusted agency • Magazine, university, network admin, etc.

  20. Improvements to Online Updates • Create an “ad blocking community” • Users add filters to online database • If filters are good, user gains karma • Filters from users with more karma get preferred • Advertisers face thousands of users entering and sharing filters

  21. Blocking HTML Page Sections

  22. Blocking HTML Page Sections • Allows blocking HTML elements containing other elements (as opposed to just <img> or <object> tags) • Path-style strings specify HTML elements • html:1/body:1/table:2 meaning “the second <table> tag in the first <body> in the first <html>” • Document Object Model (DOM) path • Paired with wildcard URLs to determine on which pages to block that HTML path • {“domainname.com/sessionid=*”, “html:1/body:1/table:2”}

  23. Implementation • HTML document viewed as tree • If a webpage URL matches the wildcarded URL • Recur into DOM tree branch as specified by DOM path • Remove matching element

  24. Possible Improvements • Command characters in DOM paths • # – Block element for all indexeshtml:1/body:1/table:#/tr:1 means “block the first row of all tables in the body” • * – Insert arbitrary pathhtml:1/*/table:#/tr:1 means “block the first row of all tables in the document”

  25. Conclusion • “Two click” blocking simplifies ad blocking for non-technical users • Online updates make sharing filters easier • Blocking entire HTML page sections is expected to be powerful for fairly static pages

  26. Thank You! • We thank the following groups for the support we have received: • The Mozilla Organization (www.mozilla.org) • The AdBlock Project (adblock.mozdev.org) • Dr. Dan Wallach, Scott Crosby and COMP 527

More Related