Font Subsetting - shrink down font files to speed up page loads

Fonts are one of the largest resources on any page after images, and can have a big impact on CLS when they vary in size from the underlying system font. Font subsetting allows us to radically shrink font file sizes, speed up initial page loads, and improve our page speed scores.

Background

With modern web design, it's not uncommon to have 4-6 fonts being loaded on each page - core font files, bold versions, italic versions, for each of a couple of different font styles. While compressed font formats like woff2 are now commonly supported, this can still lead to an additional 350-400kb of weight on an uncached page. This impacts page speed scores across the board - from Large Contentful Paint (LCP) and First Contentful Paint (FCP) through to the overall page score. While techniques like self-hosting and cache headers can speed this delivery up a bit, ideally we want to ultimately serve smaller files, without compromising the design vision of the site. Enter font subsetting!

What is Font Subsetting?

Glyph Map

Internally, each binary font file is essentially a giant table. It contains a reference for each unicode character code, and alongside it, the font’s representation of that code. Where no representation exists for a given character, an empty cell exists.

Font files will typically support a wide variety of languages, within the same font file. If we're only going to be using some of these languages, then we have an opportunity to shrink the size of the files, delivering a faster experience for our users. Within the fonts we are using, there are numerous cells taken up by values which never appear on site - values for cyrillic and other non-latin languages. What we can do is effectively “purge” these binary files to leave us with only a subset of characters, focused on the latin characters (letters and numbers from English, with accented characters used in Spanish, French, German, etc). Depending on the font files we're using, and the level of language support they have, stripping out these other characters can reduce the size of the font files by up to 60% (~60kb → ~23kb on some typical Google web fonts).

What happens if someone tries to use a character which was removed?

If the browser encounters a character which does not exist in the current font family, it will attempt to load it in the next available font family declared in our css. This can lead to an unpleasant mismatch of fonts within a single word (some letters with different height, density etc). For this reason, even if a site is only serving English-speaking audiences, we will still grab the whole latin set, rather than just a stricter subset based on UK & Ireland. This avoids words like café rendering é in a different font. Incidentally, this is how emojis are typically rendered on sites - the main font files do not generally have a rendering for emojis, so the browser fails all the way through to the first font which will render these characters (typically a system font, which is also why some emojis look different on iOS vs android vs Windows machines).

How is the subsetting done?

Manually! Subsetting of each font file can be done using a tool called Glyphhanger. Glyphhanger has a number of options for generating font subsets - extracting only glyphs for a particular character set, or even extrating only the characters which exist on a remote url! The most basic example is to generate a subset which just contains latin characters.

$ npm install -g glyphhanger

$ glyphhanger --LATIN --subset=fonts/*.woff --formats=woff2
Subsetting Roboto-Regular.woff to Roboto-Regular-subset.woff2 (was 65.7 KB, now 15.8 KB)

The above command will subset all woff files inside the fonts/ directory, creating subset files for each font discovered with just the latin character subset. The formats option allows us to specify the output format(s) we want - in this case, we're asking glyphhanger to not only subset our woff files, but to also save the output in the more compressed file format woff2.

Additional optimisations

If you have a font file where you know you'll only ever use a small number of characters (maybe a specific font for a sports scoreboard style), then making use of the whitelist option can make for a huge reduction.

$ glyphhanger --whitelist="01234567890-:" --subset=Sports-Font.ttf --formats=woff2
Subsetting Sports-Font.ttf to Sports-Font-subset.woff2 (was 304.25 KB, now 3.85 KB)

In the above example, you'll notice that we haven't just limited our subsetting to woff font files. Many older sites may still be carrying older, less-efficient file formats like ttf. With support for woff2 being widespread, this is a great opportunity to really optimise the font stack on site, moving from ttf to woff2 as the primary font supported.

Result

Results Image

The top half of this image is the network tab for font loading on the article page on a popular news site. There are a number of font variants being served for different parts of the design. The bottom half shows the result for the same article after subsetting the fonts to just the Latin characters. In this instance, the size of the transferred font files on a cold cache has dropped from ~400kb to ~140kb, which is a drop of ~65%.

This file size drop lead, in this particular case, to an LCP score increase of close to 20%, a FCP increase of 10%, and an overall page speed score in the same range. If there are limited languages in use on a particular site, then font subsetting can be a really effective way of quickly improving the optimisation of the site's page speed, and, ultimately, Google ranking!

One caveat here is that some font licences do not permit modification of the source files in any way, even for subsetting. So ensure that the licence in your font file is ok with this type of modification before proceeding!

Related Articles


Attack of the clones - removing copied websites from the internet

It's an increasingly-common experience for online publishers to discover that their content has been stolen wholesale, and posted on a different domain without permission. What can you do if this happens to your site?

Avoiding The Google Ads Two-Click Penalty

Google's Two-Click Penalty is intended to protect users and advertisers from accidental clicks on ads. When does Google apply the penalty, and how can we avoid it impacting our sites?

Minimising Cumulative Layout Shift (CLS) When Loading Responsive Ads

Responsive ads are a great way to maximise publisher revenue from display ads. Not knowing the size of the ad to be served in advance can have a big impact on Cumulative Layout Shift (CLS), and, ultimately, Google rankings. How do we maximise revenue while minimising CLS impact?

More