Using the Measurement Protocol API to find the impact of a Consent Management Platform

Updated May 2023 to cover Google Analytics 4

With changes to privacy legislation, the use of 3rd party analytics tools like Google Analytics presents an issue for site owners. The storage of cookies on user devices requires explicit user consent (ePrivacy directive), and the sharing of data Google does with their other products (ads etc, to generate demographics) is considered a data risk, again requiring explicit user consent.

For this reason, it's currently considered best practice to only engage 3rd party analytics tags after a user has expressly granted consent. This presents a problem for many site owners - how many users are refusing this consent, and thus being rendered invisible in the statistics? Depending on the publisher and market type, this can vary from 5 - 20% - an enormous range, with serious implications on the site owner's ability to understand their business.

This article covers a technique we've used to help site owners identify exactly how big a gap there is between their consenting and non-consenting audiences.

What's the difference between a 3rd party and 1st party provider?

1st party data is data that a site owner has collected themselves, based on their own users. It's not combined with any other data set. Privacy-first analytics companies like AT Internet fall under this distinction, as, while it is an external company providing the service, the data stored in their platform is isolated and access restricted to just the site owner.

Google Analytics is a 3rd party provider, as they combine data gathered from a site with data gathered from ad products. Users are cookied and tracked across different Google products, enabling things like demographic reports to be generated in Google Analytics.

The ePrivacy directive says that consent should be requested for any cookies stored on a user's device, for analytics or otherwise. In practical terms, different Data Protection Commissioners (DPC) around Europe disagree with this when it comes to analytics. In Ireland, the DPC has issued a statement that they will pursue site owners using 3rd party analytics tags without consent, but consider 1st party analytics cookies without consent to be a very low priority for enforcement at the moment. This is why the emergent best practice has been to block Google Analytics until consent is given, but 1st party analytics providers can fire tags without explicit consent.

Why not just move from Google Analytics to a 1st party provider, and not lose any data?

Google Analytics is effectively free, as it's usage is subsidised by Google's other ad products, with whom data is shared. 1st party analytics tools can get very expensive, very quickly. A site doing a little over 1m impressions a month could very quickly be looking at a €6-10k annual spend on a basic 1st party analytics tool.

For site owners with a lot of historical data in analytics, with reports and workflows set up based on it, the cost of changing can be significant. For this reason, a number of those site owners are willing to work with their current analytics reports, if they can reliably extrapolate their numbers to allow for non-consenting users. This extrapolation requires us to track visits from users who have opted out of tracking, so how do we do that in a privacy-friendly way?

Google Measurement Protocol

The Google Analytics Measurement Protocol allows developers to make HTTP requests to send raw user interaction data directly to Google Analytics servers. This allows developers to measure how users interact with their business from almost any environment.

When traditional analytics tags load on a page, they will create a payload with information on screen size, referrer, page being viewed, load times, etc, and ultimately submit that data to Google Analytics. The Measurement Protocol (MP) API allows developers to build those payloads and submit directly to the Google Analytics backend. The chief goal Google have in making this API available is to allow developers to enhance and unify their tracking across mobile devices, offline conversions, and server-side actions.

As with all things Google, there are a number of incompatible versions of this API in the wild. The MP v1 API integrates with Google Analytics v3. At the original time of writing in late 2020, Google has been pushing Google Analytics users to create v4 profiles, which better combine mobile and web data. The MP v1 API is incompatible with v4 of analytics, so this article originally covered the MP v1 API and v3 of Analytics as the working environment. In mid-2023, Google released a number of updates to the Measurement Protocol ahead of the shut-down of V3 of Google Analytics, taking the MP API out of "alpha mode".

A key note with the GA4 version of the Measurement Protocol is:

The intent of the Measurement Protocol is to augment automatic collection via gtag, Tag Manager, and Google Analytics for Firebase not to replace them.

This goal of "augmented collection" does mean that a number of features previously available to the Measurement Protocol and GA3 are not currently (May 2023) available in the latest version.

Using Measurement Protocol to measure consent to tracking

We're going to send anonymous, cookie-free page data to analytics. The idea is that we will send data for every visit to one Analytics account via the Measurement Protocol, while having the main Analytics account on the site record via the CMP in a privacy-compliant fashion. (Implementation of a conditional analytics call based on CMP consent being given is outside the scope of this article, and varies significantly depending on CMP vendor.)

The calls to the Measurement Protocol do not drop any cookies on the user device. We are going to limit the amount of data we send to ensure that no identifiable info is stored without a user's consent. The data sent will be limited to page title and path. In previous GA3 implementations it was also possible to send a referrer and custom dimension, but at the time of writing, this is not yet supported in the latest Measurement Protocol.

If the Measurement Protocol can be used to record data in a privacy-sensitive way, why not swap over all Google Analytics calls to use it?

Sending through data without local cookies means that visitor numbers will not be reliable - each pageview is effectively one visit. This means that Analytics data on users, sessions, bounce rates, and time on site are all going to wildly inaccurate. The Analytics account used by the Measurement Protocol will be used solely to check pageviews in one account against another, to gauge the level of traffic being served when consent is not an issue. Metrics like average pages per user can be used to extrapolate back roughly how many users are accounted for in the difference.

A small note here is that there is a potential for significantly different behaviour between "no consent" users and a site's standard user base, e.g. regular users may average 2 pages per session, while users who refuse consent may be closer to 1 (extreme example). This means it's not an exact science to extrapolate back user numbers, but the general trends and overall percentages in difference are useful metrics to have.

Implementing the Measurement Protocol call

The first thing we need to go is generate an API key for the data stream we want to record to. After setting up a GA4 property to act as the target for Measurement Protocol data, go to the admin screen, select the data stream for this property, and then "Measurement Protocol API secrets" from the menu.

Key menu

On the next screen which comes up, click "Create", give the secret a nickname, click "Create" to save this, then take note of the secret key which gets generated.

Secret management
Generating the secret key

We want to record page views, so let's check the list of supported events. Unfortunately there's nothing here that looks remotely like a page view event! Luckily, by checking the structure of calls made by regular analytics code, we learn that the event page_view is supported, if not necessarily documented.

events: [{
  name: 'page_view',
  params: {
    page_location: window.location.href,
    page_title: document.title
  }
}]

One thing to note - the page title is also required. Without it, the "page_view" event will show up in the "Custom events" section of analytics, rather than the dedicated page view section.

One other thing to note is that engagement_time_msec needs to be set to a non-zero value. This allows GA4 to register the hit as having come from a user. Without this value, we end up with reports that show things like 10 pageviews but no users at that time. If we're reporting solely on pageviews, this is likely not such an issue. However it can cause confusion for people new to the reporting side of things, so is probably best to add in.

Putting it all together, we get something like the below:

const measurement_id = 'G-XXXXXXXXXXXX';
const api_secret = 'AbCdEfGhIjKl1234';


fetch('https://www.google-analytics.com/mp/collect?measurement_id='+measurement_id+'&api_secret='+api_secret, {
  method: "POST",
  body: JSON.stringify({
    client_id: crypto.randomUUID(),

    events: [{
      name: 'page_view',
      params: {
        page_location: window.location.href,
        page_title: document.title,
        engagement_time_msec: "1"   // Needed for non-zero user count
      }
    }]
  })
});

Testing

When testing this implementation, the "Realtime" report in analytics will quickly show the main metrics being recorded, so should be relied upon for quick validation.

A bigger challenge is debugging when data does not appear in the realtime reports. In these cases, we can have what look like silent failures - the network call is made ok from our site, there is no error code, but data just doesn't appear. Fortunately, Google provide a debug endpoint. Changing the url from /mp/collect to /debug/mp/collect will give feedback in the console if there are things like validation issues with the data being sent. In the screenshot below, I had used a numeric value for client_id when it needs to be a string, and got feedback immediately.

Debugging errors
Debugging errors

One trade-off to be aware of is that data sent through the /debug/ endpoint won't appear in the Realtime view, or the debug panel inside Analytics. So if you run into issues on the "live" endpoint with data not showing in Realtime, swap to debug mode to analyse the fix, then revert back to the live endpoint to see the data show in Realtime.

Can we send referral data?

With the previous version of the Measurement Protocol, it was possible to pass along referrer data. This meant that acquisition channel reports could still be generated in the "user-free" version of the reporting.

With the latest version of the Measurement Protocol, this is no longer possible. There are a number of support threads on Google forums, such as this one. The responses in that thread (suggesting to possibly use custom dimensions, and then no response from September 2022 onwards) would suggest that this is unlikely to change any time soon, and may be another feature lost in the migration to the new version of the platform.

Can the Measurement Protocol record more than just pageviews?

The most common alternate type of data recorded will be events, which can be recorded by adding additional parameters to the same API call.

The event reference documentation lists the various types of calls which are supported, with Google providing a number of ways to validate the resulting call structures.

How does the Site Owner find out how much traffic is not being recorded on their main Analytics account?

Unfortunately comparing between Analytics accounts is a tedious process. The simplest way is to open a browser window in each account, and start to compare the top line figures on the Site Content (Pageview) reports inside Analytics. This will give the type of helicopter view most Site Owners are looking for, with further interrogation available by either selecting the same filters on each browser window, or exporting to CSV for deeper local analysis.

A critical caveat here is that the Measurement Protocol account should only be checked for pageviews and events. Because user-based information is not being sent, any user-based stats (pages per view, session length, etc) are highly misleading. The goal is to enable comparison between pageviews or events, to get an indication of the amount of users visiting a site, but declining analytics cookies.

The implementation details below cover Google Analytics 3, to be deprecated in mid-2023


Implementing the Measurement Protocol call - pure Javascript approach (GA3, DEPRECATED)

The following code can be placed in the head of all pages which are to be tracked.

<script>
    navigator.sendBeacon(
        'https://www.google-analytics.com/collect?v=1'+
        '&aip=1'+                                          // Anonymise IP
        '&tid=UA-XXXXXXXX-X'+                              // Analytics account
        '&dl='+encodeURIComponent(document.location.href)+ // Location
        '&dt='+encodeURIComponent(document.title)+         // Title
        '&cid='+Math.round(2147483647 * Math.random())+    // User id, randomised
        '&dr='+encodeURIComponent(document.referrer)       // Referrer info
    );
</script>

The navigator.sendBeacon() method will asynchronously make this call, so is not blocking rendering.

The cid being a random number is key here - this is the field typically used to store the user info. If we were trying to track a user, we would store this value in a local cookie, and refer to it on subsequent visits. As we're avoiding any kind of tracking, we ensure this is randomised, and no local cookies are created.

The referrer information is sent to help Analytics report on traffic sources. This can give an idea of whether consent settings are significantly different for Facebook users vs Organic search, for example. The drawback here is that, while it will be a relatively useful comparison for sites with <1.5 pages per user on average, for sites with longer browse histories, this being set on every page means that the "Direct" source will become increasingly large, skewing the other sources downwards.

The pure javascript approach allows the above snippet to be dropped into the page content without modification. With small modifications, it could also be run in something like a Cloudflare Worker, where it is executed at Cloudflare level, without the call ever needing to be triggered by the user (avoiding the risk of ad blockers targeting analytics).

Where the pure javascript approach struggles is when AMP is involved, as AMP doesn't support the insertion of arbitrary javascript. It also can become unwieldy if there is business logic to be implemented, e.g. the attached of custom dimensions to particular URLs, or re-mapping of URLs (recording /amp/ URLs in GA using the non-AMP URL combined with a custom dimension).

Supporting AMP & Custom Business Rules

Limited data will be sent to the Analytics API via the Measurement Protocol. To ensure it's sent in a consistent manner site-wide, first create a function similar to the below (PHP in this example):

class AnalyticsService
{
    public function generateCollectionUrl(string $url) : string
    {
        $params = [
            'v' => 1,
            'aip' => 1,                      // Anonymise IP
            'tid' => config('analytics.id'), // UA-12345678-1
            'dl' => $url,
        ];
        return 'https://www.google-analytics.com/collect?'.http_build_query($params);
    }
}

This example assumes that the above code is bound to an Analytics facade in Laravel, but in practice can live anywhere in the code from where it's readily-accessible to the frontend code.

This function will generate a Measurement Protocol API URL for submission of the current URL to a given analytics account. It doesn't include any visit-specific information (randomised user id, or referrer), so is cache-friendly.

The previous javascript example can then be updated to look more like the below:

<script>
    navigator.sendBeacon(
        '{!! Analytics::generateCollectionUrl(request()->fullUrl()) !!}'+
        '&dt='+encodeURIComponent(document.title)+
        '&cid='+Math.round(2147483647 * Math.random())+
        '&dr='+encodeURIComponent(document.referrer)
    );
</script>

Turning to AMP, the following analytics code can be added to any AMP template. The assumption is that the amp-analytics.js file is already being loaded on the page.

<amp-analytics>
    <script type="application/json">
    {
        "requests": {
            "pageview": "{!! Analytics::generateCollectionUrl(request()->fullUrl()) !!}&dt=TITLE&cid=RANDOM&dr=DOCUMENT_REFERRER"
        },
        "triggers": {
            "trackPageview": {
                "on": "visible",
                "request": "pageview"
            }
        }
    }
    </script>
</amp-analytics>

Here we make use of some of the in-built variables AMP provides - TITLE, RANDOM, and DOCUMENT_REFERRER.

At this point, we have Measurement Protocol hits being performed successfully on both AMP and non-AMP pages. One business rule we may want to add is to change how AMP URLs are recorded. On publisher sites, for example, AMP urls are often of the form /amp/category/original-slug-1, but reported to analytics as /category/original-slug-1, with a value of 'amp' sent to the 'platform' custom dimension.

Returning to the URL generation function, and having checked in Analytics admin that the 'platform' custom dimension corresponds to "cd1", the function can be updated as follows:

public function generateCollectionUrl(string $url) : string
{
    $params = [
        'v' => 1,
        'aip' => 1,                      // Anonymise IP
        'tid' => config('analytics.id'), // UA-12345678-1
        'dl' => $url,
        'cd1' => 'web',                  // Default custom dimension, for web
    ];

    if (preg_match('#/amp/#', $url)) {
        // Change the URL being reported to remove the /amp/ stub
        $params['dl'] = str_replace('/amp/', '/', $url);
        // Custom dimension set to amp
        $params['cd1'] = 'amp';
    }

    return 'https://www.google-analytics.com/collect?'.http_build_query($params);
}

Any subsequent calls to the Measurement Protocol will now have a custom dimension attached, as well as the business logic around recording of URLs. The custom dimension parameter will help site owners to identify if there's a noticeably different behaviour pattern between users on different channels (AMP/regular web), all in a privacy-friendly way.


PHPers Summit 2024 Speaker

PHPers Summit 2024

In June 2024, I'll be giving a talk at the PHPers Summit in Poznan, Poland. I'll be covering the quick wins available to backend developers who are asked to help with frontend speed issues - all the tips and tricks to improve load speed of the usual speed-hogs videos, fonts, and images!

Get your ticket now and I'll see you there!


Share This Article

Related Articles


Lazy loading background images to improve load time performance

Lazy loading of images helps to radically speed up initial page load. Rich site designs often call for background images, which can't be lazily loaded in the same way. How can we keep our designs, while optimising for a fast initial load?

Idempotency - what is it, and how can it help our Laravel APIs?

Idempotency is a critical concept to be aware of when building robust APIs, and is baked into the SDKs of companies like Stripe, Paypal, Shopify, and Amazon. But what exactly is idempotency? And how can we easily add support for it to our Laravel APIs?

Calculating rolling averages with Laravel Collections

Rolling averages are perfect for smoothing out time-series data, helping you to gain insight from noisy graphs and tables. This new package adds first-class support to Laravel Collections for rolling average calculation.

More