Using the Measurement Protocol API to find the impact of a Consent Management Platform

With changes to privacy legislation, the use of 3rd party analytics tools like Google Analytics presents an issue for site owners. The storage of cookies on user devices requires explicit user consent (ePrivacy directive), and the sharing of data Google does with their other products (ads etc, to generate demographics) is considered a data risk, again requiring explicit user consent.

For this reason, it's currently considered best practice to only engage 3rd party analytics tags after a user has expressly granted consent. This presents a problem for many site owners - how many users are refusing this consent, and thus being rendered invisible in the statistics? Depending on the publisher and market type, this can vary from 5 - 20% - an enormous range, with serious implications on the site owner's ability to understand their business.

This article covers a technique we've used to help site owners identify exactly how big a gap there is between their consenting and non-consenting audiences.

What's the difference between a 3rd party and 1st party provider?

1st party data is data that a site owner has collected themselves, based on their own users. It's not combined with any other data set. Privacy-first analytics companies like AT Internet fall under this distinction, as, while it is an external company providing the service, the data stored in their platform is isolated and access restricted to just the site owner.

Google Analytics is a 3rd party provider, as they combine data gathered from a site with data gathered from ad products. Users are cookied and tracked across different Google products, enabling things like demographic reports to be generated in Google Analytics.

The ePrivacy directive says that consent should be requested for any cookies stored on a user's device, for analytics or otherwise. In practical terms, different Data Protection Commissioners (DPC) around Europe disagree with this when it comes to analytics. In Ireland, the DPC has issued a statement that they will pursue site owners using 3rd party analytics tags without consent, but consider 1st party analytics cookies without consent to be a very low priority for enforcement at the moment. This is why the emergent best practice has been to block Google Analytics until consent is given, but 1st party analytics providers can fire tags without explicit consent.

Why not just move from Google Analytics to a 1st party provider, and not lose any data?

Google Analytics is effectively free, as it's usage is subsidised by Google's other ad products, with whom data is shared. 1st party analytics tools can get very expensive, very quickly. A site doing a little over 1m impressions a month could very quickly be looking at a €6-10k annual spend on a basic 1st party analytics tool.

For site owners with a lot of historical data in analytics, with reports and workflows set up based on it, the cost of changing can be significant. For this reason, a number of those site owners are willing to work with their current analytics reports, if they can reliably extrapolate their numbers to allow for non-consenting users. This extrapolation requires us to track visits from users who have opted out of tracking, so how do we do that in a privacy-friendly way?

Google Measurement Protocol

The Google Analytics Measurement Protocol allows developers to make HTTP requests to send raw user interaction data directly to Google Analytics servers. This allows developers to measure how users interact with their business from almost any environment.

When traditional analytics tags load on a page, they will create a payload with information on screen size, referrer, page being viewed, load times, etc, and ultimately submit that data to Google Analytics. The Measurement Protocol (MP) API allows developers to build those payloads and submit directly to the Google Analytics backend. The chief goal Google have in making this API available is to allow developers to enhance and unify their tracking across mobile devices, offline conversions, and server-side actions.

As with all things Google, there are a number of incompatible versions of this API in the wild. The MP v1 API integrates with Google Analytics v3. As of late 2020, Google has been pushing Google Analytics users to create v4 profiles, which better combine mobile and web data. The MP v1 API is incompatible with v4 of analytics, and at the time of writing (Jan 2021), the Measurement Protocol API for analytics v4 is in "alpha mode, subject to breaking changes, and not production ready". For the purpose of this article, we're going to consider MP v1 API and v3 of Analytics as our working environment.

Using Measurement Protocol to measure consent to tracking

We're going to send anonymous, cookie-free page data to analytics. The idea is that we will send data for every visit to one Analytics account via the Measurement Protocol, while having the main Analytics account on the site record via the CMP in a privacy-compliant fashion. (Implementation of a conditional analytics call based on CMP consent being given is outside the scope of this article, and varies significantly depending on CMP vendor.)

The calls to the Measurement Protocol do not drop any cookies on the user device. We are going to limit the amount of data we send to ensure that no identifiable info is stored without a user's consent. The data sent will be limited to page viewed, page title, referrer, and in this example, a custom dimension.

If the Measurement Protocol can be used to record data in a privacy-sensitive way, why not swap over all Google Analytics calls to use it?

Sending through data without local cookies means that visitor numbers will not be reliable - each pageview is effectively one visit. This means that Analytics data on users, sessions, bounce rates, and time on site are all going to wildly inaccurate. The Analytics account used by the Measurement Protocol will be used solely to check pageviews in one account against another, to gauge the level of traffic being served when consent is not an issue. Metrics like average pages per user can be used to extrapolate back roughly how many users are accounted for in the difference.

A small note here is that there is a potential for significantly different behaviour between "no consent" users and a site's standard user base, e.g. regular users may average 2 pages per session, while users who refuse consent may be closer to 1 (extreme example). This means it's not an exact science to extrapolate back user numbers, but the general trends and overall percentages in difference are useful metrics to have.

Implementing the Measurement Protocol call - pure Javascript approach

The following code can be placed in the head of all pages which are to be tracked.

<script>
    navigator.sendBeacon(
        'https://www.google-analytics.com/collect?v=1'+
        '&aip=1'+                                          // Anonymise IP
        '&tid=UA-XXXXXXXX-X'+                              // Analytics account
        '&dl='+encodeURIComponent(document.location.href)+ // Location
        '&dt='+encodeURIComponent(document.title)+         // Title
        '&cid='+Math.round(2147483647 * Math.random())+    // User id, randomised
        '&dr='+encodeURIComponent(document.referrer)       // Referrer info
    );
</script>

The navigator.sendBeacon() method will asynchronously make this call, so is not blocking rendering.

The cid being a random number is key here - this is the field typically used to store the user info. If we were trying to track a user, we would store this value in a local cookie, and refer to it on subsequent visits. As we're avoiding any kind of tracking, we ensure this is randomised, and no local cookies are created.

The referrer information is sent to help Analytics report on traffic sources. This can give an idea of whether consent settings are significantly different for Facebook users vs Organic search, for example. The drawback here is that, while it will be a relatively useful comparison for sites with <1.5 pages per user on average, for sites with longer browse histories, this being set on every page means that the "Direct" source will become increasingly large, skewing the other sources downwards.

The pure javascript approach allows the above snippet to be dropped into the page content without modification. With small modifications, it could also be run in something like a Cloudflare Worker, where it is executed at Cloudflare level, without the call ever needing to be triggered by the user (avoiding the risk of ad blockers targeting analytics).

Where the pure javascript approach struggles is when AMP is involved, as AMP doesn't support the insertion of arbitrary javascript. It also can become unwieldy if there is business logic to be implemented, e.g. the attached of custom dimensions to particular URLs, or re-mapping of URLs (recording /amp/ URLs in GA using the non-AMP URL combined with a custom dimension).

Supporting AMP & Custom Business Rules

Limited data will be sent to the Analytics API via the Measurement Protocol. To ensure it's sent in a consistent manner site-wide, first create a function similar to the below (PHP in this example):

class AnalyticsService
{
    public function generateCollectionUrl(string $url) : string
    {
        $params = [
            'v' => 1,
            'aip' => 1,                      // Anonymise IP
            'tid' => config('analytics.id'), // UA-12345678-1
            'dl' => $url,
        ];
        return 'https://www.google-analytics.com/collect?'.http_build_query($params);
    }
}

This example assumes that the above code is bound to an Analytics facade in Laravel, but in practice can live anywhere in the code from where it's readily-accessible to the frontend code.

This function will generate a Measurement Protocol API URL for submission of the current URL to a given analytics account. It doesn't include any visit-specific information (randomised user id, or referrer), so is cache-friendly.

The previous javascript example can then be updated to look more like the below:

<script>
    navigator.sendBeacon(
        '{!! Analytics::generateCollectionUrl(request()->fullUrl()) !!}'+
        '&dt='+encodeURIComponent(document.title)+
        '&cid='+Math.round(2147483647 * Math.random())+
        '&dr='+encodeURIComponent(document.referrer)
    );
</script>

Turning to AMP, the following analytics code can be added to any AMP template. The assumption is that the amp-analytics.js file is already being loaded on the page.

<amp-analytics>
    <script type="application/json">
    {
        "requests": {
            "pageview": "{!! Analytics::generateCollectionUrl(request()->fullUrl()) !!}&dt=TITLE&cid=RANDOM&dr=DOCUMENT_REFERRER"
        },
        "triggers": {
            "trackPageview": {
                "on": "visible",
                "request": "pageview"
            }
        }
    }
    </script>
</amp-analytics>

Here we make use of some of the in-built variables AMP provides - TITLE, RANDOM, and DOCUMENT_REFERRER.

At this point, we have Measurement Protocol hits being performed successfully on both AMP and non-AMP pages. One business rule we may want to add is to change how AMP URLs are recorded. On publisher sites, for example, AMP urls are often of the form /amp/category/original-slug-1, but reported to analytics as /category/original-slug-1, with a value of 'amp' sent to the 'platform' custom dimension.

Returning to the URL generation function, and having checked in Analytics admin that the 'platform' custom dimension corresponds to "cd1", the function can be updated as follows:

public function generateCollectionUrl(string $url) : string
{
    $params = [
        'v' => 1,
        'aip' => 1,                      // Anonymise IP
        'tid' => config('analytics.id'), // UA-12345678-1
        'dl' => $url,
        'cd1' => 'web',                  // Default custom dimension, for web
    ];

    if (preg_match('#/amp/#', $url)) {
        // Change the URL being reported to remove the /amp/ stub
        $params['dl'] = str_replace('/amp/', '/', $url);
        // Custom dimension set to amp
        $params['cd1'] = 'amp';
    }

    return 'https://www.google-analytics.com/collect?'.http_build_query($params);
}

Any subsequent calls to the Measurement Protocol will now have a custom dimension attached, as well as the business logic around recording of URLs. The custom dimension parameter will help site owners to identify if there's a noticeably different behaviour pattern between users on different channels (AMP/regular web), all in a privacy-friendly way.

Can the Measurement Protocol record more than just pageviews?

The most common alternate type of data recorded will be events, which can be recorded by adding additional parameters to the same API call.

The parameter reference documentation lists the various types of calls which are supported, with the Hit Builder tool available to validate URL structures.

How does the Site Owner find out how much traffic is not being recorded on their main Analytics account?

Unfortunately comparing between Analytics accounts is a tedious process. The simplest way is to open a browser window in each account, and start to compare the top line figures on the Site Content (Pageview) reports inside Analytics. This will give the type of helicopter view most Site Owners are looking for, with further interrogation available by either selecting the same filters on each browser window, or exporting to CSV for deeper local analysis.

A critical caveat here is that the Measurement Protocol account should only be checked for pageviews and events. Because user-based information is not being sent, any user-based stats (pages per view, session length, etc) are highly misleading. The goal is to enable comparison between pageviews or events, to get an indication of the amount of users visiting a site, but declining analytics cookies.

Related Articles


Attack of the clones - removing copied websites from the internet

It's an increasingly-common experience for online publishers to discover that their content has been stolen wholesale, and posted on a different domain without permission. What can you do if this happens to your site?

Font Subsetting - shrink down font files to speed up page loads

Fonts are one of the largest resources on any page after images, and can have a big impact on CLS when they vary in size from the underlying system font. Font subsetting allows us to radically shrink font file sizes, speed up initial page loads, and improve our page speed scores.

Avoiding The Google Ads Two-Click Penalty

Google's Two-Click Penalty is intended to protect users and advertisers from accidental clicks on ads. When does Google apply the penalty, and how can we avoid it impacting our sites?

Minimising Cumulative Layout Shift (CLS) When Loading Responsive Ads

Responsive ads are a great way to maximise publisher revenue from display ads. Not knowing the size of the ad to be served in advance can have a big impact on Cumulative Layout Shift (CLS), and, ultimately, Google rankings. How do we maximise revenue while minimising CLS impact?

More