A Guide to Filtering Spammy Data from Google Analytics


There are a number of websites out there providing ‘services’ which in doing so send fake traffic and low quality referrals to your site. These sites are a thorn in the side of Analytics data and lead to a horrid skew that we could all do without. The biggest culprit is semalt:

semalt spam culprit

But there are plenty of others.

Removing Spam Data with Exclusion Filters

One of the ways to combat this issue is to add exclusion filters for all the various sites to prevent those sending data to your Google Analytics account:

Removing Spam Data With Exclusion Filters

The problem with doing this is you’d need to add a new filter every time a new site appears that’s a causing the issue. The alternative is to use regular expressions to exclude these sites.

But again you’d have to add to it every time a new one popped up and there’s no guarantee of perfect data. There’s an excellent post here with a list of various filters you can setup that’s regularly updated.

Spam Injected Event Tracking

Just recently we’ve seen accounts plagued with fake event data. Obviously using even tracking is a good way to monitor the success of your website, whether you’re tracking clicks on downloads, submissions of forms, clicks on emails or other elements.

If this data is then skewed by what appears to be fake data it ruins the bigger picture.

Spam Injected Event Tracking

To the layman this event may well look like one that hasn’t been setup properly, but in fact this website is just injecting fake data into the Analytics account. A quick Google search shows this is a common problem. So we could add this site to the list of exclusions but there must be an easier way.

Removing Spam Data With Inclusion Filters

A new technique is to create a new view in your Google Analytics profile (a copy of your current main view):

Removing Spam Data with Inclusion Filters

And then setup a new filter to specifically to include data relevant to your hostname:

setup a new filter to specifically to include data relevant to your hostname

Doing so, thereby excludes all other ‘spammy’ data from the account, which in turn will deal with all these problem websites in one go.

Note, we’d recommend creating a copy of the view so you still have a view that includes ‘raw’ data from your site in case you need to investigate any technical issues. This filter will only contain future data and does not apply retrospectively but should provided the cleanest, most accurate view of your data moving forward.

There are considerations if you have multiple domains, branded URLs outside your main site, etc, etc – using this filter might exclude them as well, but for most sites it should be sufficient. Just pop your domain name in as the hostname filter pattern and be sure to tick ‘include’ rather than ‘exclude’ and you’re onto a winner!

Block them from your site

The other option is a bit more extreme. There’s an option outside of Analytics where you can choose to block these sites from sending sessions to your site by preventing them getting access via your .htaccess file.

For example:

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*semalt\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*buttons\-for\-website\.com/ [NC,OR]
RewriteRule ^(.*)$ – [F,L]

This can be dangerous and should be tackled by a professional developer, but it’s a good way of nipping the problem in the bud. The issue is again you’d need to add each new site to the list but it will work!

Let us know if the comments if you’ve had this problem and/or any other solutions!

More on this subject