There has been quite a bit of chatter on the Food Blogger Pro Forum lately regarding Google Analytics and traffic spam. In fact, if I search “spam” in FBP I see 9 posts just in the month of May regarding spam referrals in Google Analytics. That’s a ton!
Because of the general concern regarding spam traffic in Google Analytics, I thought it would be helpful to do some digging and find out more about this spam traffic, what it means for your blog, and what you can do about it.
What is spam in Google Analytics?
Referrer spam, or just spam, in Google Analytics shows up in your “Referrals” and “Pageviews,” depending on the type of spam. These aren’t actually visitors to your site and can be spotted by their suspicious URLs. You’ll probably recognize some of the names below:
buttons-for-your-website.com, непереводимая.рф, sanjosestartups.com, savetubevideo.com, websites-reviews.com, darodar.com, buy-cheap-online.info, pornhub-forum.ga
Analytics spam can rear its ugly head in two different ways: Ghost Referrals or Crawler Referrals. Let’s get to know these guys.
Ghost Referral Spam
Ghost Referral Spam is the product of a loophole in the way Google Analytics functions. This loophole allows web developers to send data to your Google Analytics data without ever coming into contact with your site. While this may seem like something Google would patch up, it’s actually there for a reason.
This Measurement Protocol, as it is called, is there to allow developers to capture data about user activity in new web environments and when they are offline. These things aren’t too important for most food blogs, but they are important for other large online businesses.
Below are some examples of Ghost Referral Spam that you might find in your Analytics:
непереводимая.рф, sanjosestartups.com, websites-reviews.com, darodar.com, buy-cheap-online.info, pornhub-forum.ga
Because these ghost referrers do not physically access your site they cannot be blocked using the .htaccess file on your website, which physically bans specific domains from sending traffic to your site. Instead, these must be removed from your Analytics by filtering to allow only referrals from valid hostnames. I know – this sounds confusing. Don’t worry, I’ll explain in a minute.
Crawler Referral Spam
Crawler referral spam, on the other hand, does physically visit your site and pings your Analytics just like any other visitor would. Unfortunately, these bots, as they are called, ignore any rules you lay down in the robots.txt file, which is supposed to stop any unwanted bot activity.
Sometimes these bots can visit your website time after time after time for a short while, which can give you unrealistic peaks and valleys in your traffic.
Below are some examples of Crawler Referral Spam that you might find in your Analytics:
semalt.com, best-seo-solution.com, maridan.com.ua, blog.ranksonic.com
Because these crawlers do physically contact your site, you can eliminate them by blocking them in your .htaccess file. This file can block traffic from specific domains before they ever reach your site. Alternatively, you can filter them out of your Google Analytics by excluding specific referral sources. This is the approach we are going to take today.
Why are they doing this? What’s the point?
Stellar question! We know that most of them aren’t ever actually visiting your site, so they’re not gathering data or anything. And what does a bot have to gain by going to your site repeatedly, over and over again, just to find the same information that was there the first time they visited?
The answer is pretty simple – they just want to show up in your Analytics. That’s it. They just want you to see their name. Handy Dandy Google makes referrals in your Analytics clickable links, so when they show up in your stats you can click right on over to see what all this nonsense is about. And the minute you click over, you’re recorded as a visitor on their site. We all like visitors. We love them! So do spammers, but instead of working hard to find visitors, they go through the back door.
A very important thing to note is that you should never click these links unless you are positively sure the websites are legit. Otherwise, you may end up on a page you definitely don’t want to see or, worse, with a virus on your computer.
Does the spam traffic affect my site negatively?
The good news is that this spam traffic doesn’t have much of a negative effect on your site. The Ghost Referrals never actually go to your site, and the bots are just there momentarily.
What the spam does have a negative impact on is your statistics. On any given day it can seem like you’ve had more traffic than you’ve actually had. If you are digging in deep to your Analytics to do some trending on visitors to see which posts worked best, how they interact with your content, and how you best engage them, spam traffic can really cloud your view.
One thing that crawler spam usually does affect is bounce rate. Crawler bots always have a 100% bounce rate, which can make your bounce rate overall look higher than it actually is. This is fixed when you set up your filters (see resources below), but this raises the question – does your bounce rate affect your rankings in Google? Well, Google released a short video on this, and in a word: no. Google does not take any part of your site Analytics into account when they do their rankings.
Does the spam traffic affect my site positively?
Unless you like seeing the higher number of pageviews your site gets due to Crawler Referral Spam then no, the spam traffic does not affect your site or your Analytics positively at all.
How can I get rid of the spam traffic to my site?
I’m glad you asked! This is something I’ve been meaning to do with my personal blog for a while, so I jumped on this opportunity to learn by doing! In a nutshell, here’s what happens:
- Create a new “View” in your Analytics profile.
- Find out which hostnames are valid and which aren’t.
- Create a filter to include only valid hostnames.
- Create a filter to exclude referrals from known spam bots.
- Tell Google to exclude any known well-behaved bots.
It’s pretty darn easy! Are you ready? Let’s go.
The first step is to create a new view in your Google Analytics profile. If we were to simply create the filters on your regular view then you could potentially filter out some actual traffic. So, you want to keep your original view intact and unfiltered, but create a new view with filtered results. Here’s how it goes:
Log into your Google Analytics profile, then select Admin. The third column is labeled “View” and has a dropdown menu. Select this dropdown, then click “Create new view” at the bottom.
Select website, then fill out a name, select your time zone, and hit “Create View.” Good work!
Next, click “Home” on the top left. You should see your new view down below; go ahead and select that one.
What the heck? There’s no data! Yep, what you’re seeing is correct. That’s because a new view only filters new data – so any old data you have still exists unfiltered in your original view. You can filter these existing data by using an Advanced Segment, but we’re not doing that today. We’re thinking about the future! Onward…
Now we need to find out what our valid hostnames are. We use a filter for valid hostnames to block all Ghost Referrals that ping our Analytics. The reason we use a hostname filter is because the bots (or ghosts) only ping our account because they use a randomly generated tracking code and happen to land on yours. This means that they don’t know who you are in any way, shape, or form. This is good! It also means that they don’t know what your website hostname is, so they will display something random (like a link back to their site) or will display “not set.”
To find your hostnames, open up your original Analytics view by clicking Home, then selecting your original view.
When you are inside your original view, select the longest date range you can – back to the beginning of the life of your blog would be best! Then, click “Audience” from the left-hand menu, then “Technology” and “Network.” Finally, select “Hostname” to display the hostnames.
Take note of which hostnames are valid for your site. These can be tricky, so be careful. Google.com is not a valid hostname, but googleusercontent.com is valid. Any place that you have entered your tracking code into (think PayPal, YouTube, LeadPages, etc.) should show up there and is a valid hostname.
Alright! Time to create some filters. Navigate back to the Admin area then select your new view (the one you want to filter) and select “Filters.” Click to add a new filter.
Fill out a name for your filter, then select “Custom.” Select “Include” (since we are filtering to include only our valid hostname) then click “Hostname” under “Filter Field.” Next, you get to fill out your filter pattern.
The filter pattern is pretty specific, so take note here. First, identify from your hostnames search you did in your original view which hostnames are valid. These get typed into the “Filter Pattern” field in your new filter with a prefix of
.*. The prefix ensures that any subdomains (e.g. translate.googleusercontent.com) are captured. Each URL is followed by a bar symbol
| unless it is the last URL. The last URL does not have a bar
| after it. Finally, there are no spaces in the filter pattern at all. None, zip, zilch.
Here’s what my filter pattern looked like:
Now, save your filter! It is important to note that this filter must be updated each and every time you enter your tracking ID into a new app, software, or tool. If you don’t add the new web address, the traffic you get at those new locations won’t be captured.
Okay, we’re halfway there. Now we need to filter out any Crawler Referrals that did actually visit your website. Because they actually went to your website and visited a page, this means that they will use your valid hostname – and so won’t be filtered out using the hostname filter we just implemented.
Create a new filter like we just did for the hostnames, but this time fill it out for the Crawler Referrals. Select a custom filter, then exclude by campaign source. Add any spam referrals you have seen. Some of the most common ones are:
I just went ahead and copied that directly into my filter. If you have other spammy referrals you’d like to add, feel free. These crawlers change frequently, so whenever you notice a new one pop up in your Analytics, be sure to add it to the filter.
The last thing we need to do is to remove any traffic from well-behaved bots, like the Google bot. These bots are good because they index your site and make sure you show up in search results. However, since they aren’t very meaningful traffic for our Analytics, we’d like to keep them from showing up in our stats. Fortunately, this is super easy!
Head over to your Admin section of Google Analytics, then select your new filtered view. Below, click “View Settings.”
If you scroll down a bit, you’ll see a check box to “Exclude all hits from known bots and spiders.” Check that guy, then save your settings.
And we’re done! Congratulations, you now have a fancy new view for your Google Analytics that excludes any traffic from:
- ghost referral pings that go directly to your Analytics profile and don’t go through your hostname
- crawler referrals spam that actually does visit your website
- well-behaved bots that don’t provide any meaningful information
Remember that you still have your intact original view, which will show all of the traffic above as well as your legitimate traffic. It is a good idea to keep an eye on both of these to make sure you aren’t excluding any real traffic from your filtered view.
Or, as an alternative, you can install “GM Block Bots” plugin to your WordPress blog.
Thanks so much for this tutorial. I think I’ve got it all set up, so we’ll see how traffic looks from here…