210 likes | 402 Views
Cloak and Dagger . In a nutshell…. Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages in search results. Ubiquity of advertising on the Internet. Search , by and large, enjoys the primacy.
E N D
In a nutshell… • Cloaking • Cloaking in search engines • Search engines’ response to cloaking • Lifetime of cloaked search results • Cloaked pages in search results
Ubiquity of advertising on the Internet. • Search, by and large, enjoys the primacy. • Search Engine Optimisation– SEO – doctoring of search results. • For benign ends such as simplifying page content, optimizing load times, etc. • For malicious purposes such as manipulating page ranking algorithms.
Cloaking • Conceals the true nature of a Web site • Keyword Stuffing – Associating benign content to keywords • Attracting traffic to scam pages • Protecting the Web servers from being exposed • Not scamming those who arrive at the site via different keywords.
Types of Cloaking • Repeat Cloaking • User Agent Cloaking • Referrer Cloaking (sometimes also called “Click-through Cloaking”) • IP Cloaking
DAGGER Dagger encompasses five different functions – • Collection of search terms • Querying search results generated search engines • Crawling search results • Detecting cloaking • Repeating the above four processes to study variance in measurements
Collection of Search Terms Two different kinds of cloaked search terms are targeted: • TYPE 1 : Search terms which contain popular words. • Aimed at gathering high volumes of undifferentiated traffic. • TYPE 2: Search terms which reflect highly targeted traffic • Here cloaked content matches the cloaked search terms.
TYPE 1 : Use popular trending search terms • Google Hot Searches andterms - shed light on search engine based data collection methods, respectively • Alexa- client-based data collection methods • Twitter terms clue us on social networking trends. • Cloaked page entirely unrelated to the trending search terms • TYPE 2: set of terms catering to a specific domain • Content of the cloaked pages actually matches the search terms.
Querying Search Results • Terms collected in the previous step are fed to the search engines • Study the prevalence of cloaking across engines • Examine their response to cloaking. • Top 100 search results and accompanying metadata compiled into list • “Known good” domains entries eliminated in order to false positives during data processing. • Similar entries are grouped togetherwith appropriate ‘count’.
Crawling Search Results • Crawl the URL’s. • Process the fetched pages • Detect cloaking in parallel • Helps minimize any possible time of day effects. • Multiple crawls
Normal search user • GooglebotWeb crawler • A user who does not click through the search result • Detect pure user-agent cloaking without any checks on the referrer. • 35% of cloaked search results for a single measurement perform pure user-agent cloaking. • Pages that employ both user-agent and referrer cloaking are nearly always malicious. • IP Cloaking - half of current cloaked search results do in fact employ IP cloaking via reverse DNS lookups.
Detecting Cloaking • Process the crawled data using multiple iterative passes • Various transformations and analyses are applied • This helps compile the information needed to detect cloaking. • Each pass uses a comparison based approach: • Apply same transformations onto the views of the same URL, as seen from the user and the crawler • Directly compare the result of the transformation using a scoring function • Thresholding - detect pages that are actively cloaking and annotate them. • Used for later analysis.
Temporal Re-measurement • To study lifetime of cloaked pages. • Temporal component in Dagger. • Fetch search results from search engines • Crawl and process URLs at later instances of time. • Measure the rate at which search engines respond to cloaking • Measure the duration pages are cloaked
Cloaking Over Time • In trending searches the terms constantly change. • Cloakerstarget many more search terms and a broad demographic of potential victims • Pharmaceutical search terms are static • Represent product searches in a very specific domain. • Cloakers have much more time to perform SEO to raise the rank of their cloaked pages. • This results in more cloaked pages in the top results.
Sources of Search Terms • Blackhat SEO – artificially boost the rankings of cloaked pages. • Search detect cloaking either directly (analyzing pages) or indirectly (updating the ranking algorithm). • Augmenting popular search terms with suggestions. • Enables targeting the same semantic topic as popular search terms. • Cloaking in search results highly influenced by the search terms.
Search Engine Response • Search engines try to identify and thwart cloaking. • Cloaked pages do regularly appear in search results,. • Many are removed or suppressed by the search engines within hours to a day. • Cloaked search results rapidly begin to fall out of the top 100 within the first day, with a more gradual drop thereafter.
Cloaking Duration • Cloakers manage their pages similarly independent of the search engine. • Pages are cloaked for long durations: over 80% remain cloaked past seven days. • Cloakerswill want to maximize the time that they might benefits of cloaking by attracting customers to scam sites, or victims to malware sites. • Difficult to recycle a cloaked page to reuse at a later time.
Cloaked Content • Redirection of users through chain of advertising networks • About half of the time a cloaked search result leads to some form of abuse. • long-term SEO campaigns constantly change the search terms they are targeting and the hosts they are using.
Domain Infrastructure • Key resource to effectively deploy cloaking in scam: • Access to Web sites • Access to domains • For TYPE I terms, majority of cloaked search results are in .com. • For TYPE II terms, cloakers use the “reputation” of pages to boost their ranking in search results
Search Engine Optimization • Since a major motivation for cloaking is to attract user traffic, we can extrapolate SEO performance based on the search result positions the cloaked pages occupy. • Cloaking the TYPE I terms target popular terms that are very dynamic, with limited time and heavy competition for performing SEO on those search terms. • Cloaking TYPE II terms is a highly focused task on a static set of terms, • Provides much longer time frames for performing SEO on cloaked pages for those terms.
Conclusion • Cloaking has become a standard tool in the scammer’s toolbox • Cloaking adds significant complexity for differentiating legitimate Web content from fraudulent pages. • Majority of cloaked seaarchresults remain high in rankings for 12 hours • The pages themselves can persist far longer. • Search engine providers will need to further reduce the lifetime of cloaked results to demonetize the underlying scam activity.