260 likes | 496 Views
Automatic Ad Blocking. Justin Crites and Mathias Ricken November 24, 2004 Comp 527 – Computer Systems Security Rice University. Web Advertisement – The Facts. Annoying Animation Sound Potentially dangerous May contain malware Ad download reveals user’s IP
E N D
Automatic Ad Blocking Justin Crites and Mathias Ricken November 24, 2004 Comp 527 – Computer Systems Security Rice University
Web Advertisement – The Facts • Annoying • Animation • Sound • Potentially dangerous • May contain malware • Ad download reveals user’s IP • Deteriorate user’s web experience
Web Advertisement – The Future • More ads • Q3 2003: $1.79 billion online ad spending • Q3 2004: $2.43 billion (+36%) • Bigger, richer ads • More half-page ads, fewer banners • 33% rich media (Flash, popups) • Expected to surpass image ads in 2005 Source: Interactive Advertising Bureau Source: DoubleClick
“Two click” blocking • Automatic online updates • Blocking of entire HTML page sections Project Goals • Make ad blocking easy to use • Particularly for novice users
Project Platform • Mozilla Firefox • Open-source browser • Extensible through JavaScript • Source code for many plugins available as examples • AdBlock • Open-source plugin • Filter certain HTML elements based on regular expressions
AdBlock in Detail • Filter out HTML elements • <IMG> • <EMBED> • Selection controlled by blacklist with wildcarded URLs • Example: http://*.somedomain.com/pagead/ads?* • * denotes zero or more arbitrary characters • <IFRAME> • <OBJECT>
AdBlock Problems • Wildcards entered manually • Concepts too difficult for non-technical users • No simple way to share filters • AdBlock only blocks very few HTML elements • Non-recursive
Developing for Mozilla Firefox • Register plugin as “chrome provider” for • Skin • Content • AdBlock is a content provider • Plugins consist of • XUL – XML user interface definition • JavaScript – event scripting • Platform • Localization
Problems for Developers • JavaScript issues • No strict typing • No variable declarations necessary • Lack of debugging support • Uninformative errors • No single stepping/breakpoints • Code of plugins split between several files • Confusing control flow, if badly written
“Two Click” Blocking • User blocks items with minimum interaction • Right-click Context menu • Left-click Autoblock • AdBlock now has three lists • User filters (like before) • Autoblock URLs • Generated filters
Wildcarded URL Generation • Intelligently build wildcarded URL • Go from http://ad.domainname.com/ads/someimage.gifhttp://xxx.domainname.com/ads/anotherpic.jpg…tohttp://*.domainname.com/ads/* • Keep matching parts, replace different parts with * Longest Common Subsequence (LCS)
Longest Common Subsequence • Dynamic programming, O(n2) in time and space • Nested for-loops • If a[i] = b[j],M[i, j] M[i-1, j-1] + 1 • else pick maximum from left or above 0 0 0 1 1 1 1 1 1 1 2 2
Longest Common Subsequence • Dynamic programming, O(n2) in time and space • Nested for-loops • If a[i] = b[j],M[i, j] M[i-1, j-1] + 1 • else pick maximum from left or above • Example: BDAB • Insert * after diagonal streches • BD*AB* 0 0 0 1 1 1 1 1 1 1 2 2
LCS to Wildcarded URL • LCS often includes very short fragments http://ad.domainname.com/ads/someimage.gifhttp://xxx.domainname.com/ads/anotherpic.jpggenerateshttp://*.domainname.com/ads/*o*e*i*.* Only accept matching fragments with length > 2 • Cannot merge all URLs together • Result would be http://*or similar
URL Merging • Only merge two URLs if similar enough • Look at wildcarded URL from LCS • Remove all non-alpha-numeric characters • Remove common fragments • http • com and other top-level domains • gif and other file suffixes • Merge only if string is still non-empty • else try merging with other URL or keep separate
Improving URL Merging • Wildcarded URLs are sometimes too general • Possible improvements • Do not merge across domains • Take directory structure into account • Treat numbers as one entity, not separate characters
Automatic Online Updates • New menu item in “Preferences” dialog • Import filters from URL • Can automatically update after specified interval, e.g. one week • Circumvents file system • Users can import filters from trusted agency • Magazine, university, network admin, etc.
Improvements to Online Updates • Create an “ad blocking community” • Users add filters to online database • If filters are good, user gains karma • Filters from users with more karma get preferred • Advertisers face thousands of users entering and sharing filters
Blocking HTML Page Sections • Allows blocking HTML elements containing other elements (as opposed to just <img> or <object> tags) • Path-style strings specify HTML elements • html:1/body:1/table:2 meaning “the second <table> tag in the first <body> in the first <html>” • Document Object Model (DOM) path • Paired with wildcard URLs to determine on which pages to block that HTML path • {“domainname.com/sessionid=*”, “html:1/body:1/table:2”}
Implementation • HTML document viewed as tree • If a webpage URL matches the wildcarded URL • Recur into DOM tree branch as specified by DOM path • Remove matching element
Possible Improvements • Command characters in DOM paths • # – Block element for all indexeshtml:1/body:1/table:#/tr:1 means “block the first row of all tables in the body” • * – Insert arbitrary pathhtml:1/*/table:#/tr:1 means “block the first row of all tables in the document”
Conclusion • “Two click” blocking simplifies ad blocking for non-technical users • Online updates make sharing filters easier • Blocking entire HTML page sections is expected to be powerful for fairly static pages
Thank You! • We thank the following groups for the support we have received: • The Mozilla Organization (www.mozilla.org) • The AdBlock Project (adblock.mozdev.org) • Dr. Dan Wallach, Scott Crosby and COMP 527