Web analytics data loss

The most essential thing to remember with web analytics is that confidence in the data used to generate metrics must be very high, the single most critical factor is that the data is accurate as close to 100% when it comes to measurements being collected. In reality that accuracy is quite volatile.

text

Web analytics data hemorrhage

Data loss in the web analytics is a topic that few venture into as there is far more concern on either pumped up robot numbers, embedded boosting or auto refresh inflation. These 3 topics put the finger on the manipulation of actual page views.

Then we have the 1440 kind of visitors that seems to be invisible to many web analysts, this despite noticeably influencing the time spent metric and pumping up page views quite evidently. This should upset any advertiser...

On the other end of things is the loss of web analytics data. With log files this was evident, and tag based data collection was seen as the magic medicine that sorted this out. Problem is as technology evolves so does the same old data loss issues, it is visible if simple web analytics is applied!

text

Taking the pulse of a web site

To assess data loss is quite simple, place an extra data capture event just below the opening body tag. Recommendations are by JIC's and guidelines to have the page tag code placed at the bottom of the page code just above the closing body tag.

Then let a few days pass to get a decent amount of data collected, it is obvious after a few hours but as always more is better as capturing data from a diverse set of devices makes the problem of data loss more detailed since it is also a device issue.

A word of warning, do make sure that the extra data capture does not turn up in your favorite web analytics solution(s) as page views. Quite obvious but then it is better safe than sorry with all the explanations that inflated page views will require.

Data loss findings

Once enough data has been collected, the comparison of measurement volumes of a site can be done across various metrics. For a media site focusing on the home page of the site is optimal as a starting point out of several reasons.

The core reason behind this is that home page typically has the most page views, most visits start there, and then the fact that normally the home page is the page with the most external content being pulled in.

As can be seen in the chart page view data loss is easy to catch and visualize. Comparing the dual measurement of home page view on two platforms reveals that the data loss is far larger on tablets than computers in this example.

Analysis will also reveal that there are many different causes for the loss of data, some which can be addressed by simple changes. Some causes however will require a different approach to the business model if the data loss is to be addressed.

Despite the evidence some web site owners will ignore what the data is telling them and carry on, sort of a modern day digital version of "The Emperor's new clothes".

Data loss instigators

There are many instigators of data loss to be found on web sites, the key culprits are (by no means in any ranking order whatsoever):

  • the sheer volume of bloated ads loaded from sluggish ad servers to a single page
  • processor and browser performance on specific surfing devices
  • lack of proper image caching / content distribution
  • bloated web pages riddled with time consuming "like button" loading and the likes
  • the placement of the data collection tag
  • users leaving or navigating away from pages before the data collection calls have been fired off

Tag management solutions are touted as a solution, but the inherent issue is that too much is happening too slowly. The heaviness of the web analytics tags isn't a problem no matter what is said, just look at the average number of bytes an ad has and ignore such marketing nonsense.

Yes, tag management systems take away a substantial part of the tagging related pains. But no, it will not solve or remove the issues generated by the above named instigators, and it is worth keeping in mind that it adds a new single point of failure when it comes to web analytics data collection.

Quick fix solutions

If data collection quality and user experience are a matter of concern then the evident quick fixes are:

  • setting an average maximum load time on ad servers and only using those that meet it
  • avoid loading too much on specific surfing devices
  • get proper image caching / content distribution implemented
  • question why time consuming external items should be on the page in the first place
  • move the placement of the data collection tag above all slow loading content calls

One additional smart solution is to load all not in viewport visible external content on a page to occur after the data collection tag has been allowed to run, this will alone ensure that far less tracking data is lost. It is also the quickest solution available!

The 3 key question to ask your organization is 1) how long time on average does it take our data collection tag for be fired off, 2) how much data are we bleeding = causing data loss, and 3) are the above quick fixes part of our current configuration.

If answers are not to be found then there is evidently some work to be done before data loss can be reduced and user experience improve. The choice is to either apply a bit of brains to address the issue(s) or foolishly see data deterioration render the analytics data increasingly useless.