200 likes | 202 Views
Spam and spambots make Google Analytics data more difficult to analyze. Different methods can be used to operate spambots on various mediums. On different forums & communities, these bots are programmed to interact with users as if they were humans. Learn how to get rid of spambots and other junk traffic.
E N D
Spambots and Other Junk Traffic - What is it and how to Get Rid of it
What is a Spambot You cannot deny the importance of Google Analytics for understanding and measuring your users’ behavior. Millions of people around the globe use it for good reason. It is still my experience that many sites (in all sizes) forgo data filtering after installing the tracking code, despite this important factor in decision-making for many businesses. Google Analytics (GA) data has been entered by referral spammers without ever actually visiting our websites since around 2013. Admins often see referral spam as either a fake traffic referral, a search term, or even a direct visit. The referrer, displayed in your GA referral traffic, is hijacked by spammers pretending to be from their preferred website but actually, it is their own. It’s unlikely that referral spam will harm your site since it doesn’t actually trigger a fake visit (provided you don’t click on spam links).
In order to make sense of Google Analytics data, marketers must filter out this type of traffic manually. Our major ongoing marketing decisions are based on GA, so clean data is of the utmost importance. Marketers may draw inaccurate conclusions based on bogus bot traffic if they do not know about referral spam and how to filter it. The purpose of this column is to teach marketers how to filter referral spam from Google Analytics data. If filtered data isn’t available, Google Analytics properties can be compared with those things that are made from styrofoam but contain edible parts. You might think it is true at first glance, and it might even feel right when you cut a slice of it, but as you go deeper you find there is much that is artificial. Most people don’t pay attention to the real user data in Google Analytics or haven’t configured theirs properly. If you’re like someone who only pays attention to the summary reports, you might not notice all the bogus data mixed in with your site visitors. Because of this, you won’t realize that your time is spent analyzing data that isn’t representative of your site’s performance.
How do spambots operate Different methods can be used to operate spambots on various mediums. By creating additional accounts on various sites, it can comment on social groups, forums, and communities with irrelevant information. On different forums and communities, these bots are programmed to interact with users as if they were humans. How Spambots do Multiple Signups? There are few fields on signup forms, and any hacker could write a script that would program a bot to fill in those fields. In this way, they do a number of vague signups resulting in a flood of spam accounts. Similarly, the genuine user will experience increased interactions on websites due to this irrelevant signup, increasing the possibility of having higher bounce rates on the signup form.
Types of spambot Based on the kind of activity, Spambots and Other Junk Traffic can be of different types. Like a few scrap data, some spam on the comment section of websites, some send an unwanted message through emails. Email spam Comment spam Social media bots They crawl web pages and collect email based on patterns, such as surname.name@domain.na me. A database of email addresses is created once the data is harvested by scraping. Spam is a form of automated posting usually found in open forums. Fake typically created with the intent of selling a product or generating links to increase traffic. Facebook, Twitter, and Instagram are the sites where most bots are active. Offers, deals, and products are generally posted by these bots. The post will be liked, shared, and commented on even though it has no relevance to connecting posts. Alternatively, a real user’s account can be compromised by a fake account. This will appear to be a legitimate account. usually be found a set of rules that require Twitter bots to retweet, like, tweet, and retweet posts. addresses comments are There can
How to detect spambots What is important is to identify these bad bots and how to avoid being influenced by them. There are many ways to detect bots. They often mimic human behavior in order to mask bot traffic as a real human. There are some methods of bot detection that are relatively simple and require little technical knowledge. You can easily check if and when bots visit your website by using the easy methods. Some other methods can be more difficult to implement, as they require more technical expertise in order to analyze the data and apply the fixes accordingly. Having said that, here are some of the best ways to detect bot traffic on your website. Direct Traffic Sources Reducing Server Performance Speed of your Website Faster Browsing Speed ● ● ● ● ● Spike Unexpected Location Passive Fingerprinting Active Fingerprinting in Traffic from Junk User Information Content Scraping Inconsistent Page Views Increasing Bounce Rate ● ● ● ● ● ● ● ●
Best Practices, tools and techniques to get rid of spambot A. Blocking Comment Spam B. Time-analysis of forms C. Geolocation based Blocking form D. Blacklisting IPs E. Web Application Firewalls F. ReCAPTCHA G. Confirmed or Double Opt-In
Filter spam & bots in your Google Analytics Traffic A. In which reports can you look for spam? B. Bot traffic C. In which reports can you look for bot traffic? This will be discussed in more point below. Development environments/staging environments For development/staging environments, which reports can you look at? Sites and services that archive and cache web content In which reports can you look for traffic from web archive sites and cache services? ● Internal traffic Direct internal traffic Reports to look at for direct internal traffic. Sites/tools provided by third parties What are the recommended reports to view traffic from internal third-party tools? ● ● ● ● ● ● ● ●
Google Analytics Filters that you can use to Filter Traffic a. When Using Filters, Consider These Factors The changes made by filters are permanent! Be patient. • Generate a view without filters. Verify that the permissions are set correctly. Retroactive application of filters is impossible. b. Types of Filters • • • • Default filters and custom filters are the two main types. Default filters are rarely used because they are limited. Regular expressions enable the use of custom one so that they are much more flexible. In the custom filters, you can select between five categories: exclude, include, lowercase/uppercase, search and replace, and advanced.
c. Test your Filters In addition to ensuring correct filters and REGEX, make sure that the filter changes are permanent. They can be tested in three different ways: Immediately after you have selected the filter, click on “Verify this filter.” It is quick and easy. In addition, a small sample of data doesn’t make it the most accurate. Online REGEX testers are a good option since they are very accurate and colourful. They also let you learn since they display every matching part and explain why it matches. You can test your filter by using an in-table temporary filter in GA; the filter will be applied to your entire historical data set. You surely will not miss anything by following this method. ● ● ● It is easy to use the built-in filter verification for a simple filter or if you have experience with filters. In order to be certain your REGEX is correct, my recommendation is to build it on the online tester and then retest it with the in-table filter.
d. How to Create Filters I will describe the steps involved in creating the filters below in the standard manner in order to avoid being repetitive: 1. Visit the administration section of your Google Analytics account (the gear icon), and then go to the configuration section. 2. Then, click the “Filters” button ” (it will say “All filters” click on that ) under the View column (master view) 3. To add a filter, click the red button “+Add Filter” (if you don’t see it or can only apply/remove existing filters, then you don’t have edit permissions at the account level. Ask your admin to set this up for you) 4. Afterwards, configure each filter according to its specific settings. It’s highly recommended that you get familiar with the filter window so that you can improve Analytics data quality.
Google Analytics Filters that you can use to Filter Traffic A preventative measure Spam that appears as ghosts Hostnames for development Sites that scrape data Sites that serve as caches and archives ● ● ● ● Spam can be effectively blocked by this filter. The hostname filter is by contrast preventative. It is rarely updated, contrary to other commonly shared solutions. The term “ghost spam” refers to spam that never actually visits your site. A feature in Google Statistics measures the data directly to the company’s servers, a tool which in normal circumstances permits tracking from often-forgotten devices, like coffee makers or refrigerators. You collect data from real users, then GA gets it; therefore, you get valid information. Using ghost spam, GA servers are sent spam directly, without recognizing your site URL, so all the data left are fake.
Hostnames and how to find them We’re getting to the “tricky” part now. You must compile a list of your valid hostnames in order to create this filter. It is basically your tracking code that appears wherever a hostname appears. Hostname reports provide the following information: Change the primary dimension to Hostname in the header by clicking Audience > Select Network. Your domain name should appear at least once if your Analytics are active. You may find more than one, in which case, take a look at each one and select those that apply to you.
Identifying the source of a campaign (crawler spam, internal) The following types of traffic are blocked: Crawler spam Toolkits (Trello, Pingdom, Asana) used internally ● ● Despite the fact that these hits are referred, you should use the field “Campaign source” in the filter – not the field “Referral”. Spam filter for crawlers Spam crawlers, on the other hand, are the second most common. As with ghost spam, they leave a fake URL, but they visit your site, as opposed to ghost spam. As a result, they leave an accurate hostname. The expression will be made in a similar way to the hostname filter, but this time, the source/URL of the spammy traffic will be input. Unlike include filters, exclude filters can be created multiple times.
Tools used by internal third parties can be filtered out Having the crawlers spam filter and internal third-party tools separately is just my preference since it’s easier to organize them and make them easily accessible for updating. Language spam filters as well as other types of spam filters Most spam will be stopped by the first two filters; spammers may resort to other methods to avoid them. You may be shown a reputable source such as Apple or Google in addition to one of your valid hostnames. The spammers have targeted my site (it appears they do not agree with my site; I am not saying everyone knows my site). It is not uncommon for spammers to inject their messages into page titles, keywords and even the language of the report, even if they look fine in the host and source.
Filters for bot traffic originating from direct links As bot traffic leaves no trace like spam, it’s a bit difficult to filter, but you can still do it if you’re patient. Activate bot filtering as soon as possible. Activating it by default would be a good idea, in my opinion. View settings can be found in the Analytics administration section. Next to the currency selector, you’ll see the option “Exclude all spiders and bots.” Bot activity signs There are some ways to detect bots, even though it is difficult to detect them: Increased direct traffic without a natural cause Older versions of software (browsers, OS, Flash) They only visit the homepage (represented by the slash “/” in GA) Excessive metrics: Nearly 100% bounce rate, Almost no time has elapsed since the session began, Each session is limited to one page, Users who are 100% new. ● ● ● ● ● ● ● ●
Internal IP filtering By wrapping up the hostname filter (with the internal traffic coverage), and the campaign sources (with the third party traffic coverage), we can now cover the different types of traffic from sites on internal networks. The last and most destructive is traffic generated directly by you or someone on your team while they work on this project. By adding a filter, you can exclude all locations on the site that use a public IP address (not a private one). Filtering internal URL queries There is no way to exclude them when they travel, access the site from their personal location, or use a mobile network when they’re in the company.
Query filtering for advanced internal URLs Internal traffic filtering has never been easier with this solution! The new solution is more comprehensive and uses Google Tag Manager, cookies and a GA custom dimension, and to filter internal traffic dynamically. Bonus filter: Include traffic only coming from within the company There are times when it’s useful to know what traffic employees generate internally – either as part of a marketing effort or as part of curiosity. To handle that situation, you could create a new view called “Internal Traffic Only.” You should then use one of those internal filters. Let’s just have one! To be counted, the included filter must match all the included filters.
Conclusion Google Analytics will report properly when it has real and accurate data. The Internet is full of junk and artificial information if it is not filtered properly. Even worse, if you don’t realize the data in your reports is bogus, you’ll likely make poor or incorrect decisions about the direction of your site or business.